[Neo4j] multuple optional match in cypher query takes a long time

Marwa Elabri Sun, 26 Mar 2017 14:21:46 -0700


for purpose of comparison between SQL queries and cypher queries for my 
researches I have the same relational database (with 50.000 relationships) 
which I converted to neo4j database and I want to execute same queries for 
relational and graph database and then optimize execution time by replacing 
SQL queries by cypher ones my database has entityclasses where each two 
entityclasses are related by a relationshipclasse I want to execute join 
queries(in case of SQL) and cypher queries with optional match clause to 
get the same results in a relational database we can start selecting data 
from a relation table or a class table or a relation table in the case of 
neo4j corresponds to a relationship that's why i have two types of queries 
as follow :


CALL apoc.index.relationships('relationshipclazz0','att0:*') YIELD rel as R0 
OPTIONAL MATCH (N:entityclazz0)<-[R0]-(N0:entityclazz1) OPTIONAL MATCH 
(N0:entityclazz1)-[R1:relationshipclazz0]-() WITH distinct R0, R0.att0 as 
AR0att0, count(R1.att1) as AR1att1, R1.att1 as BR1att1 ORDER BY AR1att1 
desc,BR1att1,ID(R0), AR0att0 WITH ID(R0) as i, R0.att0 as O1, 
head(collect(BR1att1)) as O2, R0   RETURN O1, O2, count(i) ORDER BY O1, O2

the first query I started from a relationship the second query in which I 
started from a entityclazz node is as follow:

CALL apoc.index.nodes('entityclazz1','att0:*') YIELD node as N0  OPTIONAL MATCH 
(N0)-[R0:relationshipclazz0]-() OPTIONAL MATCH (N0)-[R1:relationshipclazz0]-() 
OPTIONAL MATCH ()-[R2:relationshipclazz0]-(N3:entityclazz1) WITH distinct N0, 
N0.att0 as AN0att0, count(R0.att3) as AR0att3, R0.att3 as BR0att3,  
count(N3.att1) as AN3att1, N3.att1 as BN3att1 ORDER BY AN3att1 
desc,BN3att1,AR0att3 desc,BR0att3,ID(N0), AN0att0 WITH ID(N0) as i,   N0.att0 
as O1,  head(collect(BR0att3)) as O2,  head(collect(BN3att1)) as O3, N0 RETURN  
O1, O2, O3, count(i) ORDER BY  O1, O2, O3

despite I use node and relationship index using APOC procedures but these 
queries take many time to get result 2175417 ms for the second query. so I 
have more than two optional match in my queries the query slow down and it 
give result after a long time or I'm obliged to separate my query by 
optional match because if I make just one path in a single match query I 
will get only the result of the last node or relationship that I put in the 
path and not all the traversed nodes and relationship so with optional math 
I can store the result of the match executed before + the result of the 
optional much

for example if I execute this query:

CALL apoc.index.nodes('entityclazz1','att0:*') YIELD node as N0  OPTIONAL MATCH 
(N0:entityclazz0)<-[R0:relationshipclazz0]-() WITH distinct N0, N0.att0 as 
AN0att0, count(R0.att1) as AR0att1, R0.att1 as BR0att1  order by AR0att1 
desc,BR0att1,ID(N0), AN0att0  WITH ID(N0) as i,   N0.att0 as O1,  
head(collect(BR0att1)) as O2, N0 RETURN  O1, O2, count(i) ORDER BY  O1, O2

the result is

O1  O2  count(i)0   0       60   1       20   2       300   3       220   null  
  1201   0       21   2       31   3       31   null    32


but my problem that I have to optimize the time of the query I use NEO4J 
3.1.0 

<https://lh3.googleusercontent.com/-nJJwcmWvlL0/WNgure4xN1I/AAAAAAAAAXM/m2h6TUI4xjsoLBd01KyNMIP3PLyt-gOhwCLcB/s1600/meta_graph.png>



this is the property_graph "meta_graph" of my graph database in which I 
have 5 node labels and 4 relationship types. each node label correspond to 
an entity class table in relational database ans each relationship type 
correspond to a relation "join" table in relational database 
<https://i.stack.imgur.com/302LU.png>

now I want to find links between each attribute X.A and its parents X.K.B 
with k is a path which relate the attribute X.A and its parents X.B because 
I work in domain of probabilistic graphical models.

I will take a real example to understand more seeing the following pistures 
in which I have thre node labels professor,course and student and two 
relationships types takes and teachs and properties "attributes" are in 
cercles. red arcs present dependencies relationships between properties 
<https://i.stack.imgur.com/YDdmD.jpg>


<https://lh3.googleusercontent.com/-y9yayz4J7S8/WNgwWXBDRfI/AAAAAAAAAXU/FOi1JKDK3_cS8jUrLNOn_XfBCdjp_esYQCLcB/s1600/17474890_668100610060222_1222690006_n.jpg>


for example here student.grade depend of student.intelligence and depends 
of course.difficulty.here to find the probability of this dependency I need 
to perform counts over each attribute and its parents. if we take the samae 
example I need to make count of 
(take.grade=A,take.course.difficulty=low,take.student.intelligence=high) 
for this instance me query will return how many times in my database I have 
this combination "observation".So in the case of relational database we 
need to make join between relation and class tables using foreign keys to 
be able to navigate beteen table to reach each attribute in its class or 
relation table. Now in my case I have a generic form of databases that I 
work with. in wich I have entityclasses which are related by 
relationshipclasses.entity and relationship classes have attributes 
att0,att1,att2,... and each attribute "property" has a doamin of value for 
example the domain value of entityclazz0.att0 is [0,1,2] and the domain 
value of entityclazz1.att1 is [0,1], etc.

to make counts this correspond in sql queries to make select queries in 
which I can have multiple joins using foreigns keys so my idea is to 
replace this joins by cypher query in which I use math and optional match 
clause to make my count of properties "attributes". example of queries are 
given in my previously in my first question and my comments

in the sace of relational database I can start selecting attributes from a 
relation "join" table or I can start selecting attributes from an entity 
class table. so that in my cypher queries sometimes I have match 
()-[relationshipclazz]-() then a series of optional match and sometime I 
have match (entityclazz) then a series of optional match. like its 
mentionad in my first question then because I used APOC procedure I just 
replaced the first matches clause by respectively

*CALL apoc.index.relationships('relationshipclazz0','att0:*') YIELD rel as 
R0*

and

*CALL apoc.index.nodes('entityclazz1','att0:*') YIELD node as N0*

pleaaaase help me if you have idea because it is urgent I want to optimize 
my queries because in a big graph database with just 60.000 relationships 
and nodes these type of queries take a lot of time specialy if I have 
multiple optional match( which correspond to multiple joins in my sql 
query) so if I have a big level of joins I translate it by multiple 
optional mutch or this is take a lot of time in a cypher query

I can also give you more details if you need more. thanks in advance

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] multuple optional match in cypher query takes a long time

Reply via email to