On Sun, Jan 4, 2015 at 10:45 PM, Fábio Dias <[email protected]> wrote: > As promised, profile of v.generalize, as of r63952. > (The data might not be exactly the same, I might have run v.clean somewhere).
Thanks for your thorough code analysis! My initial guess was wrong, Vect_line_intersection2() is not the limiting factor. The R tree is also used to feed Vect_line_intersection2(), but here it seems to be no bottleneck. The limit was Vect_rewrite_line() and the functions called by it. I have optimized the GRASS vector library in trunk r64032 and added another topology check to v.generalize in trunk r64033. The profile of v.generalize now shows that it is limited by disk I/O speed (on my laptop with a standard laptop-like spinning HDD), which means that the algorithms are, under the test conditions, close to their optimum. This picture might change as soon as you use a high-performance server or a SSD. The speed improvement is non-linear: for small datasets as in the official GRASS datasets, you won't notice a difference. For one tile of Terraclass, the processing speed should be about 2-4 times faster than before. For the full Terraclass dataset, the processing speed could be >10 times faster than before. You will need to wait until say 10% of the processing has been reached in order to estimate the total processing time. Simplifying each line takes its own time, therefore the processing time of 100% is not equal to 100 x the processing time of 1%. Another user has applied v.generalize to NLCD2011 and it took nearly 2 months. Your dataset is probably a bit smaller, but the Terraclass shapefiles are full of errors. If you want to fix these errors, this will take some time. I recommend to test the new v.generalize first on a subregion of Terraclass. Only if the processing speed and the results are acceptable, proceed with the full dataset. Otherwise, please report. Markus M > > I still have the raw profiles, if anyone wants them. > > F > -=--=-=- > Fábio Augusto Salve Dias > http://sites.google.com/site/fabiodias/ > > > On Sun, Jan 4, 2015 at 6:01 PM, Fábio Dias <[email protected]> wrote: >> Attached is pdf generated with google-perf of v.generalize, using >> g7b4. I'm running it again for trunk. >> -=--=-=- >> Fábio Augusto Salve Dias >> http://sites.google.com/site/fabiodias/ >> >> >> On Sun, Jan 4, 2015 at 5:54 PM, Markus Metz >> <[email protected]> wrote: >>> On Wed, Dec 31, 2014 at 5:20 PM, Fábio Dias <[email protected]> wrote: >>>> >>>> I fussed about the v.generalize code, thinking about pthread >>>> parallelization. The geometry part of the code is *really* fast and >>>> can be easily parallelized so it can run even faster. But, according >>>> to the profile google-perf gave me, the real bottleneck is inside the >>>> check_topo function (which uses static vars and inserts a new line >>>> into the vector, not only checks if it breaks topo - got stuck a while >>>> in there due to the misnomer). More specifically in the Rtree function >>>> used to check if one line intersects other lines. >>>> >>> >>> The function used in check_topo is Vect_line_intersection() which does >>> much more than just testing for intersections. The process could be >>> made much faster if Vect_line_check_intersection() would be modified >>> such that connections by end points are ignored. But I don't know if >>> this would break other modules or other functionality. >>> >>> Markus M _______________________________________________ grass-user mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-user
