jrgemignani commented on issue #195:
URL: https://github.com/apache/incubator-age/issues/195#issuecomment-1064297992


   I think I need to clear up some misconceptions first -
   
   The VLE is a path finding algorithm, its speed is directly related to the 
size and connectedness of the graph and what it is looking for. Remember, this 
is a graph, not a tree, so nodes can be reused in paths but not edges. This 
means, the longer the path, the less qualified the edges are in a path, the 
more time it will take in very connected graphs. Btw, other than the start and 
terminal vertices, the algorithm doesn't care about the interior vertices; it 
only cares about the edges. 
   
   If we look at the following queries: 
   
       MATCH p=(u)-[*]-(v) RETURN p
       MATCH (u)-[*]-(v) RETURN u
       MATCH ()-[*]-() RETURN COUNT(1);
   
   All of these queries will basically do the same thing - search for all paths 
of any length from all vertices to all vertices. This means that for a graph of 
1 million nodes, this algorithm will be invoked up to 1 trillion times (the 
number of VxV combinations). Now, add that you are asking for all of the paths 
between each combination. So, each additional path length can add to the work 
done. This will always be the case for finding paths.
   
   As you can see, the best way to deal with this is to limit what the 
algorithm has to do. This means restricting where it needs to look and how 
deeply it needs to look. Just remember, just because you restrict the input 
vertices doesn't mean that you restrict the interior vertices. The algorithm 
doesn't care about them, it only cares about interior edges. Additionally, due 
to other implementation factors (PostgreSQL being a big one) the algorithm 
generally generates all valid (within the selection window) paths from a start 
vertex and leaves the match against the terminal vertex to PG.
   
   I will add this -
   
   There are 2 patches coming out in the next release that will improve 
performance of some queries by up to 10%. I wouldn't expect their impact to be 
that great but, that depends on the graph and the query. This is across the 
board, btw, not just the VLE.
   
   Additionally, I am currently working on a patch to the VLE that will negate 
some of the overhead due to how PG calls the VLE function. This has the 
potential of greatly increasing the speed of a large subset of queries. 
However, it is unlikely to be available as part of the next release. But, it 
should be available to pull down from the current master in about 1-2 weeks.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to