> > > > Quick question - presumably for VRT datasets any source images currently > > share the global cache and are treated from this proposals' POV as their > > own "datasets"? As well as the VRT being a separate dataset? If so, seems > > like this could be quite a major win for users with VRTs for tiled > > images? > > The GDALRasterBlockManger if done per dataset is a member of the base > GDALDataset Object, so all inherited classes will have it as well. So yes, > I believe that should be the case that each seperate dataset in the VRT > would have its own cache. All of this is assuming you set the global config > option to use a per dataset cache.
This might be a problem in practice since the amount of cache might grow quickly. By default a VRT will open simultaneously up to GDAL_MAX_DATASET_POOL_SIZE (whose default is 100, but can be changed at runtime with the config option) source datasets. If you do a gdal_translate of the VRT (and that it has more than 100 sources) then you will need 100 * GDAL_CACHEMAX bytes for the cache. Of course once you are aware of that you can play with both configuration options to find the best settings. > > > While it's a lot of code change, it's mostly simple/repeated changes. > > > > > Solution 2 (RW Mutex in GDALRasterBlock) > > > Cons: > > > It means that writing a driver is not as trivial as before and care > > > must > > > > be taken in how locking is done within the driver in order to prevent > > deadlocks and maintain thread safety. > > > > I'm wary about making drivers more complex, there's a lot of them. And > > many are unlikely to be tested in multi-threaded anger during initial > > development (and MT issues can be hard to unit test). Can you explain a > > bit further what would be the impact for typical driver styles? Can we > > make the typical case (no/locked multithreaded access) really easy? What > > sort of driver code would be needed for drivers that can do MT reads? MT > > reads and writes? > > I completely understand the being wary about making drivers more complex, > there are a ton of them and I had to edit every single one for my > implementation of Solution 2. However, I am starting to like this solution > more and more. The reason that this solution must be this way, is because > many drivers will call the same block from different bands. This can cause > deadlocks if not careful because two threads on the same dataset could be > in the same processes at different times. To fix this I am passing a mutex > through the IReadBlock, IWriteBlock, and IRasterIO functions. This allows > for a much more fine level of control over the process. It allows the mutex > used to protect the Block Data pointer (that is passed through the > IReadBlock, etc) to be protected only at the times required. For many > drivers its simply blocking the entire time the reading/writing process is > occuring. > > Multi-threading reads should be possible in many cases as long as the > driver code itself is safe for such an implementation. If a locking is used > to protect the file pointer and only allow a single thread to be reading > the file from disk at a time in most drivers they should become thread safe > (also the protection of any global statics in the drivers). This means > increased complexity in most drivers to make them thread safe, but I think > it would be possible. The question is : how many drivers do not require a global lock in their IReadBlock() implementation ? Just throwing an idea : how about a IReadBlock_thread_safe and IReadBlock virtual methods ? The first one would have a default implementation in GDALRasterBand that would basically call the second one with a global mutex. That way drivers that wan't a more fine grained locking could implement their own IReadBlock_thread_safe, whereas the ones that don't care would just implement IReadBlock. The few current call sites of IReadBlock would just need to be updated to call IReadBlock_thread_safe instead. > > Multi-threading writes is a little more complex, and is not well covered in > these changes. Issues can come up with reading and writing at the same > time. Here is an example of a complex situation. > > Thread 1: Writes to Block A located at (0,0), this block is then marked for > deletion from the cache and is intended to be flushed. > > Thread 1: Removes Block A from the Band's block array > > Thread 2: Attempts to Read Block at (0,0). Not finding any blocks it > creates Block B and inserts it to Band Block Array > > Thread 2: Reads from the driver and loads data into Block B. > > Thread 1: Writes data from Block A which is being flushed to the driver. > > Thread 2: Edits Block B, but incorrectly because the changes from Block A > are not included. > > I think that this can be corrected in the code by always requiring that all > flushing through the driver occurs prior to removing the block from the > block array. Just checked my code and this is possible right now in > Solution #2, so it might need to be something that I should fix. > > In a more generally large issue, there are issues with even more simple > things with writing. Imagine someone warping into a band at the same time a > fill operation on a band is being executed, if there is no protection it > would end up with a weird set of values. Personally I think this, is > something we shouldn't attempt to fix currently as this should just be > accepted as a bad way to write a multi-threaded application. Agreed. > If you can > think of any other situations where writing might have issues I really > would like to know. > > Thanks, > > Blake -- Spatialys - Geospatial professional services http://www.spatialys.com _______________________________________________ gdal-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/gdal-dev
