Hello Tiago,

Thanks a lot for swift update. I timed the 2.22 against commit bf174831
<https://git.skewed.de/count0/graph-tool/commit/bf174831> (compiled with
CXXFLAGS=-O3). That does not give an accurate drill

On my typical ~7e6 vertices graph, for a typical search that reach about
1e5 vertices, the improvement is about 20%. We have 83ms (average over 100
runs) for the 2.22 release versus 69ms for commit bf174831. Interestingly,
the bf174831 version is more stable with std = 4ms versus 6ms for the 2.22
release.

What I understand from the boost thread
<https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk> is that
we could have only initialization for the whole lifespan of the program. If
I get it right, their proposal would, prior to a search, to "re-set" the
values from distances vector and predecessors vector only for the reached
vertices of the last search. I think it worth a try because, from what I
timed the python side of initialization could cost around 18ms (5ms for
distance map, 13ms for predecessor map) see timeit snipet below.

import timeit
# pred map
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.arange(7e6)',
number=1000)/1000
# distances
timeit.timeit(setup='import numpy as np;
x=np.zeros(int(7e6),dtype="int64")', stmt='x[:] = np.inf)',
number=1000)/1000

I did the above timings on a modern work station, to switch to a cloud
provider induces a non negligible overhead (+63% for the distances / +469%
for the pred map).

What do you think ?

F.



Le mer. 28 juin 2017 à 18:58, Tiago de Paula Peixoto <[email protected]> a
écrit :

> On 28.06.2017 15:00, François Kawala wrote:
> >
> > I try to dig up an old idea that we've discussed previously (the thread
> is
> > here
> > <
> http://main-discussion-list-for-the-graph-tool-project.982480.n3.nabble.com/Shortest-distance-complexity-when-used-with-max-dist-td4026018.html
> >).
> >
> > summary: when one runs shortest distance on a graph with several million
> > vertices the distance initialization cost is high with respect to the
> actual
> > Dijsktra cost. We want to circumvent the initialization [see L326 of
> > src/graph/topology/graph_distance.cc
> > <
> https://git.skewed.de/count0/graph-tool/blob/master/src/graph/topology/graph_distance.cc#L326
> >]
> >
> > The proposed approach (as per this boost thread
> > <https://groups.google.com/forum/#%21topic/boost-list/7nCzvkkEXCk>) is
> to
> > use the `dijkstra_shortest_paths_no_color_map_no_init`. The distance
> > initialization is done "on the fly" by the `examine_edge` function. More
> > precisely, the Dijkstra visitor maintains a set of discovered vertices.
> This
> > set is read in the `examine_edge` function to set the distance to
> infinity
> > when a the target vertex does not belong to the set of discovered
> vertices.
> > The `examine_edge` function is also responsible to update the  set of
> > discovered vertices.
>
> I've pushed a commit:
>
>    https://git.skewed.de/count0/graph-tool/commit/bf174831
>
> that replaces the search functions by no_init ones. The initialization is
> still done, but only once via a single numpy call in the Python side of
> things.
>
> The speed of the numpy call should be comparable to what we get with
> allocation, which is unavoidable.
>
> Please see if the performance improvements you observe with this is
> acceptable.
>
> Best,
> Tiago
>
>
> --
> Tiago de Paula Peixoto <[email protected]>
>
> _______________________________________________
> graph-tool mailing list
> [email protected]
> https://lists.skewed.de/mailman/listinfo/graph-tool
>
_______________________________________________
graph-tool mailing list
[email protected]
https://lists.skewed.de/mailman/listinfo/graph-tool

Reply via email to