Le mercredi 09 février 2011 09:59:02, Antonio Valentino a écrit : > Hi Francis, > > Il giorno Wed, 9 Feb 2011 09:38:49 +1100 > > Francis Markham <[email protected]> ha scritto: > > Is there a document anywhere specifying the best practices for > > parallel writes to a GDAL raster? I have an embarrasingly parallel > > problem that I would like to parallelise with MPI, but I'm not sure > > what I am allowed to do in parallel. I would like to assign blocks > > exclusively to worker threads to read and write concurrently, but I > > am unsure how I might do this safely. > > > > Any suggestions would be greatly appreciated. > > Are you sure it is the best way to parallelize your problem? > If you write tho the disk operation are in some way serialized. > It is a common schema to have several worker processes/thread for CPU > bound computations and a single IO processes/thread that collects all > results and write them to disk. > > The IO process/thread int this case can also be designed to implement > some optimization or caching mechanism that you could not apply at > worker process level. > > With this schema you can bypass completely the problem of concurrent > writes without loss of performance IMHO.
Yes I completely agree with the approach you suggest. Genuine concurrent writing is likely not possible on the whole chain and will result in queing at some point, for example at the OS level. (Unless you use some RAID, in which case if the file is spread over multiple disks, it is perhaps possible to have genuinge parallel writing of blocks of data of the same file.) Nevertheless on GDAL side, you cannot safely use the WriteBlock() interface on the same RasterBand object from different threads. For example, in the GeoTIFF driver, for pixel-interleaved images, there is an optimization that doesn't flush immediately the block to libtiff until another block is read/written. This optimization is made so that, for a pixel-interleaved RGB image, you can fill the R,G and B components of a of a block into memory without having to flush it to disk at each time. The GTIFFDataset object has thus state variables to maintain the current dirty block number and the associated buffer. If you try to write from different threads, they will overwrite this dirty block number and buffer, leading to chaos. I'd note that inside libtiff itself, there are state variables too, mainly when compression codecs are involved (but possibly also in the uncompressed case. I haven't checked). So if you want to use the WriteBlock() interface, you have to a full code inspection of the driver code to determine if it is safe to do so. And I'm afraid that the most popular drivers aren't. If you use the RasterIO() interface, you might encounter the issue that Frank described about the flushes occuring potentially into a unappropriate thread. I have identified in the past also a few potential race conditions. See http://trac.osgeo.org/gdal/ticket/3225 and http://trac.osgeo.org/gdal/ticket/3226. That last one is probably the situation that Frank described. I'm not sure how/if it can be solved by identifying an "appropriate" thread. This supposes that there's somehow a "owner" thread of a dataset, which is not currently a requirement. What happens if thread A opens a dataset, writes a few blocks (that go into the block cache), then terminates and then thread B do other IO and close the dataset... ? Anyway, if you always use RasterIO() interface and not mix RasterIO() with IReadBlock()/IWriteBlock(), I don't think that there are actually race conditions (I will not guarantee it however. Race condition analysis is terribly difficult) that could lead to the underlying IWriteBlock() to be called from different threads concurrently (which is bad, see 2nd paragraph). So using RasterIO() from different threads will lead to serialize the flushing of blocks to the disk due to the use of the global cache mutex. And I doubt using directly libtiff/libgeotiff will make it easier. Best regards, Even _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
