> I have optimized the GRASS vector library in trunk r64032 and added > another topology check to v.generalize in trunk r64033. The profile of > v.generalize now shows that it is limited by disk I/O speed (on my > laptop with a standard laptop-like spinning HDD), which means that the > algorithms are, under the test conditions, close to their optimum. > This picture might change as soon as you use a high-performance server > or a SSD.
Then I should do a profile on my current setup. My grassdata dir is not a disk, but a mounted ramdisk, which is, basically, ram, aka really, really fast. It should be interesting. By the way, it is really easy to do, at least on linux, and it should really improve the performance for big datasets. Obviously, you'd need a big machine too, but well, a big nail needs a big hammer. cd ~ mkdir -p grassdata sudo mount -t tmpfs -o size=512M tmpfs grassdata In my case, the machine has 128Gb, so I made a 32Gb ramdisk. Each vector directory has 6Gb, so it is plenty. Of course, the data will be lost if you shutdown or reboot the machine, so extra care is needed. I did not compare the result with and without the ramdisk btw. > The speed improvement is non-linear: for small datasets as in the > official GRASS datasets, you won't notice a difference. For one tile > of Terraclass, the processing speed should be about 2-4 times faster > than before. For the full Terraclass dataset, the processing speed > could be >10 times faster than before. You will need to wait until say > 10% of the processing has been reached in order to estimate the total > processing time. Simplifying each line takes its own time, therefore > the processing time of 100% is not equal to 100 x the processing time > of 1%. Indeed, but it was a (very) rough approximation. > Another user has applied v.generalize to NLCD2011 and it took nearly 2 > months. Your dataset is probably a bit smaller, but the Terraclass > shapefiles are full of errors. If you want to fix these errors, this > will take some time. You know this dataset? The errors are really bugging me. It is, mostly due to the process/tools they usually use. We have passed over the request for a more topologically correct approach. Maybe on the next iteration. But I'll create another thread asking advice regarding these errors shortly :) > I recommend to test the new v.generalize first on a subregion of > Terraclass. Only if the processing speed and the results are > acceptable, proceed with the full dataset. Otherwise, please report. Testing before deploying? Where's the fun in that ? :) Joking aside, I did that before trying the full dataset. I did, however interrupt the processing to start over with the new trunk version, because you said it would be faster. And indeed it is, thank you very much. By not previously dissolving and further doing v.clean tool=break the original data, I've reduced the processing time from more than 30h for 1% to 24h to 11%. With the latest release, 9% in 18h. However, this whole thing got me thinking about you said on an early message: > The check_topo function can not be executed in parallel because 1) > topology must not be modified for several boundaries in parallel, 2) > data are written to disk, and disk IO is by nature not parallel. Well, disk IO, there's not much we can do about it. On high end servers, again, I'm thinking big hammers, this shouldn't really be a bottleneck nor lock the threads for long, between the disk speed and cache, this should barely lock each thread. Assuming the "vector access" functions to be thread safe (which I think they will eventually be, IMHO it would be the first step to make the whole software "parallel-capable"), we could allow parallel changes in the topology by carefully choosing which lines are going to be considered at a time. One simple example might be lines whose bounding boxes do not intercept. Not sure how much overhead this would cause, or if it would be worth it. Thanks again, F _______________________________________________ grass-user mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-user
