I'm working on multi-core VMss in a Cloud environment, that access their data on a central dataserver via NFS. Parallellizing jobs for different map sheets gives huge accelerations for C-programs like gdaladdo, but there seems to be a problem with Python-based programs like rgb2pct.py. Consider the following:

(
    rgb2pct.py file1.tif file1_256.tif
    gdaladdo file1_256.tif 2 4 8 16
)&
(
    rgb2pct.py file1.tif file1_256.tif
    gdaladdo file1_256.tif 2 4 8 16
)&
.. etc, for all available cores
wait

When running this on a 16-core VM I see first 16 python processes, each with CPU-loads around 20% for each processor, and then 16 gdaladdo processes, with CPU-loads around 95%. When I replace the tif-input-files for rgb2pct.py by equivalent jpg-files, the loads for the 16 rgb2pct.py processes increase to about 80% and the overall computing time more than halves.

So my impression is that one Python I/O process blocks all others. I have read something about Python's GIL (Global Interpreter Lock, http://docs.python.org/faq/library#can-t-we-get-rid-of-the-global-interpreter-lock) and the multi-processing module, but I don't see an easy way to implement this for my setup. Does anyone have a simple solution for this problem?

Jan

_______________________________________________
gdal-dev mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to