jrgemignani edited a comment on issue #195:
URL: https://github.com/apache/incubator-age/issues/195#issuecomment-1064297992
I think I need to clear up some misconceptions first -
The VLE is a path finding algorithm, its speed is directly related to the
size and connectedness of the graph and what it is looking for. Remember, this
is a graph, not a tree, so nodes can be reused in paths but not edges. This
means, the longer the path, the less qualified the edges are in a path, the
more time it will take in very connected graphs. Btw, other than the start and
terminal vertices, the algorithm doesn't care about the interior vertices; it
only cares about the edges.
If we look at the following queries:
MATCH p=(u)-[*]-(v) RETURN p
MATCH (u)-[*]-(v) RETURN u
MATCH ()-[*]-() RETURN COUNT(1);
All of these queries will basically do the same thing - search for all paths
of any length from all vertices to all vertices. This means that for a graph of
1 million nodes, this algorithm will be invoked up to 1 trillion times (the
number of VxV combinations). Now, add that you are asking for all of the paths
between each combination. So, each additional path length can add to the work
done. This will always be the case for finding paths. Now the algorithm does do
some things to help minimize this.
As you can see, the best way to deal with this is to limit what the
algorithm has to do. This means restricting where it needs to look and how
deeply it needs to look. Just remember, just because you restrict the input
vertices doesn't mean that you restrict the interior vertices. The algorithm
doesn't care about them, it only cares about interior edges. Additionally, due
to other implementation factors (PostgreSQL being a big one) the algorithm
generally generates all valid (within the selection window) paths from a start
vertex and leaves the match against the terminal vertex to PG.
I will add this -
There are 2 patches coming out in the next release that will improve
performance of some queries by up to 10%. I wouldn't expect their impact to be
that great but, that depends on the graph and the query. This is across the
board, btw, not just the VLE.
Additionally, I am currently working on a patch to the VLE that will negate
some of the overhead due to how PG calls the VLE function. This has the
potential of greatly increasing the speed of a large subset of queries.
However, it is unlikely to be available as part of the next release. But, it
should be available to pull down from the current master in about 1-2 weeks.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]