Hi all,

I am in the process of evaluating Neo4j for use in a production environment 
and I have encountered some difficulty when doing something that I expected 
to be simple. I have managed to solve it, but in a suboptimal and quite 
complicated way so I was thinking if there might be a simpler way to 
accomplish the same thing.

*Preamble*
Neo4j version 2.3.2

*TL;DR*
This is a bit of a long explanation so the summary is as follows:

Given node A, I need to find all nodes in the subgraph of A with a 
complexity of O(number_of_vertices + number_of_edges).

*Problem*
For our use case we have a graph that, with respect to one particular type 
of relationship, is broken up into smaller disconnected subgraphs of no 
more than a couple of dozen nodes each. What we are trying to accomplish 
is, given an indexed id from one of those nodes discover all nodes in the 
subgraph (while treating the graph as being undirected). The other 
peculiarity is that our nodes always have the reflexive edge from every 
node back to itself.

Algorithmically all we need is a breadth first search with an expected 
complexity of O(num_of_vertices + num_of_edges). Since our graphs are not 
dense the number of edges in them is roughly linear to the number of number 
of vertices so the overall complexity should be linear to the number of 
vertices in it.

*Test graph*
For simplicity I have made this test graph a fully connected one. Since the 
point is to compare cypher queries this does not affect the results.

Commands:

   - CREATE (:Label { id: 1 }),(:Label { id: 2 }),(:Label { id: 3 
   }),(:Label { id: 4 })
   - MATCH (a),(b) MERGE (a)-[:REL]->(b)
   
*Simple query*
The first query I tried to get the desired results is the following:

   - MATCH (a:Label {id:1})-[:REL*1..]-(b:Label) RETURN DISTINCT b
   
That query never terminated. When I added an upper bound for the 
relationship pattern and profiled the query I got the following:

   - Depth 4: 3902 db accesses
   - Depth 8: 1714982 db accesses

So instead of being linear to the number of vertices and edges this looks 
like it's finding all possible paths which of course explodes with depth.

*Better performing query*
To accomplish this I wrote the following query instead: 
https://gist.github.com/Dalamar42/1ec93cd74b01c145e7bd
(This will search to depth 2. Duplicate lines 6-16 to search to depth 4, 6, 
8, and so on)

The query does the following:

   1. Get the entry node
   2. Add that node to a collection of *nodes_found* and another of 
   *nodes_to_visit*
   3. For every node A in *nodes_to_visit* follow its edges to node B
   4. From nodes B keep only nodes that are not in *nodes_found*
   5. Set *nodes_to_visit* to nodes B and set *nodes_found* to its previous 
   value plus nodes B
   6. Repeat to the desired depth

This query should almost have the complexity of BFS except for one 
complexity. From what I understand every intermediate MATCH/WHERE needs to 
match at least one node otherwise cypher returns empty disregarding nodes 
found in the previous steps. I worked around this by changing step 4 to: "From 
nodes B keep only node A and nodes that are not in *nodes_found*". Because 
all nodes have the reflexive edge, node A will always be in the set of 
nodes B and by always keeping it I make sure that that part of the query 
will always match at least one node.

This means this query has the following problems:

   - I need to define the max search depth
   - Increasing depth does not come for free since I am exploring the edges 
   of A every time
   - This query is a pain for someone to read, especially given that this 
   will actually be only part of a bigger query that has to do this a couple 
   of times

*Better solution?*
Does anyone know of a better/simpler solution? Am I missing some obvious 
feature of Cypher? I wasn't able to find anything going through the 
documentation.

Thanks,
Thomas

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to