Hello Tiago, Thanks a lot for swift update. I timed the 2.22 against commit bf174831 <https://git.skewed.de/count0/graph-tool/commit/bf174831> (compiled with CXXFLAGS=-O3). That does not give an accurate drill
On my typical ~7e6 vertices graph, for a typical search that reach about 1e5 vertices, the improvement is about 20%. We have 83ms (average over 100 runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly, the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22 release. What I understand from the boost thread <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk> is that we could have only initialization for the whole lifespan of the program. If I get it right, their proposal would, prior to a search, to "re-set" the values from distances vector and predecessors vector only for the reached vertices of the last search. I think it worth a try because, from what I timed the python side of initialization could cost around 18ms (5ms for distance map, 13ms for predecessor map) see timeit snipet below. import timeit # pred map timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)', number=1000)/1000 # distances timeit.timeit(setup='import numpy as np; x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)', number=1000)/1000 I did the above timings on a modern work station, to switch to a cloud provider induces a non negligible overhead (+63% for the distances / +469% for the pred map). What do you think ? F. Le mer. 28 juin 2017 à 18:58, Tiago de Paula Peixoto <[email protected]> a écrit : > On 28.06.2017 15:00, François Kawala wrote: > > > > I try to dig up an old idea that we've discussed previously (the thread > is > > here > > < > http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html > >). > > > > summary: when one runs shortest distance on a graph with several million > > vertices the distance initialization cost is high with respect to the > actual > > Dijsktra cost. We want to circumvent the initialization [see L326 of > > src/graph/topology/graph_distance.cc > > < > https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326 > >] > > > > The proposed approach (as per this boost thread > > <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is > to > > use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance > > initialization is done "on the fly" by the `examine_edge` function. More > > precisely, the Dijkstra visitor maintains a set of discovered vertices. > This > > set is read in the `examine_edge` function to set the distance to > infinity > > when a the target vertex does not belong to the set of discovered > vertices. > > The `examine_edge` function is also responsible to update the set of > > discovered vertices. > > I've pushed a commit: > > https://git.skewed.de/count0/graph-tool/commit/bf174831 > > that replaces the search functions by no_init ones. The initialization is > still done, but only once via a single numpy call in the Python side of > things. > > The speed of the numpy call should be comparable to what we get with > allocation, which is unavoidable. > > Please see if the performance improvements you observe with this is > acceptable. > > Best, > Tiago > > > -- > Tiago de Paula Peixoto <[email protected]> > > _______________________________________________ > graph-tool mailing list > [email protected] > https://lists.skewed.de/mailman/listinfo/graph-tool >
_______________________________________________ graph-tool mailing list [email protected] https://lists.skewed.de/mailman/listinfo/graph-tool
