Hamish wrote: > > > I would like to know the planned changes for the raster library, > > > especially the random access of pixels in the raster. > Markus: > > Not sure if all of those are actually planned, but here is a list: > > http://grass.osgeo.org/wiki/GRASS_7_ideas_collection#Raster > > > > > I wanted to work on it some months back, but my daily job got more > > > intense. > > > In the coming future, we will need to access easily any row for > > > parallel processing. > > One thing I wonder about for parallel processing of (serial) raster > modules- do we really need random read access to send each individual row > into a separate thread? The overhead with that seems counter-productive. > Couldn't we read some GRASS_NPROC envrio variable and then split the > overall number of rows by that number and create a small number of > threads, ie matching the system.
If you just want to speed up top-to-bottom processing, that doesn't require random access, just a scrolling window (which several modules already use, either via rowio or with their own cache). For random access, the main issue is that you want to avoid performing the decompression, format conversion and resampling steps more than once. In practice, this means making a temporary "raw" copy of the data, and then caching it. Exactly how you cache it depends upon your expected access pattern. For truly random access, you probably want to cache it in rows. Where there is some degree of locality, tiles will tend to produce better results. > another thing I still wonder about (see thread from a month or so back) > is where to start? Modify the libs to support the concept, then tackle > each module on their own? ie concentrate on the non-I/O limited and > can't-do- much-about-it but throw more processor at the problem modules, > and leave non-number crunching modules alone? -- concentrate on areas > where we'll get the most bang for the buck / pick off low hanging fruit / > etc? It depends upon whether we want to make the raster I/O operations thread-safe. If we do, that could involve a significant amount of work, particularly if we don't want to reduce efficiency. One efficiency issue is that the library keeps a decompressed copy of the last row which was read. This means that if you're up-sampling the data (the current region has finer resolution than the raster), adjacent rows which correspond to the same source row don't require reading and de-compressing the data. [However, the re-sampling and the conversion to the requested type (CELL/FCELL/DCELL) are repeated for each row. Even though it's almost inevitable, it isn't actually guaranteed that you'll request the same format or the same resolution for each row.] If you are trying to parallelise a top-to-bottom module, and one thread requests a row that is in the middle of being read by another thread, should it perform a redundant read, or simply wait for the original thread to de-compress the row? Also, the approach of having a single "slot" for the most recent row won't extend to multiple threads. E.g. if you have 10 threads and you're up-scaling the data 2:1, you would need 5 slots (each source row will be consumed by two threads). Parallelising the output is simpler. However, if you want to support compressed files, there would need to be a critical section so that each thread can reliably determine the offset at which its data is written. Regardless of whether you want compressed files, if you don't have pwrite(), you would need to make lseek() + write() into a critical section. If you have pwrite() and don't need compressed files, there are no inherent concurrency issues. There might be issues with the existing code using pre-allocated buffers, but those can be fixed. BTW, for 7.x, can we assume that alloca() is available? It would make it much easier to write re-entrant code by avoiding the need to pre-allocate buffers (the alternative is lots of calls to malloc/free, which could be a significant performance hit). -- Glynn Clements <[EMAIL PROTECTED]> _______________________________________________ grass-dev mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-dev
