Re: [Neo4j] Creating efficient multiple match queries?

Michael Hunger Fri, 20 Mar 2015 13:55:52 -0700

Hi Scott,

I presume you crated indexes or constraits for :ObjectConcept(name) ?


> Am 20.03.2015 um 17:22 schrieb Scott Campbell <[email protected]>:
> 
> I am working with an acyclic, directed graph (an ontology) that models human 
> health and am needing to identify certain diseases (example: Pneumonia) that 
> are infectious but NOT caused by certain bacteria (staph or streptococcus).  
> All concepts are Nodes defined as ObjectConcepts.  ObjectConcepts are 
> connected by relationships such as [ISA], [Pathological_process], 
> [Causative_agent], etc. 
> 
> The query requires:
> 
>  a) Identification of all concepts subsumed by the concept Pneumonia as 
> follows:
> 
> MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept)
this already returns a number of paths, potentially millions, can you check 
that with
> MATCH p = (a:ObjectConcept{Pneumonia}) <-[:ISA*]- (b:ObjectConcept) return 
> count(*) 
> 
> b) Identification of all concepts subsumed by Genus Staph and Genus Strep 
> (including the concept Genus Staph and Genus Strep) as follows.  Note: 
> 
> with b MATCH (b) q = (c:ObjectConcept{Strep})<-[:ISA*]-(d:ObjectConcept), h = 
> (e:ObjectConcept{Staph})<-[:ISA*]-(f:ObjectConcept) 
> 
this is then the cross product of the paths from "p", "q" and "h", e.g. if all 
3 of them return 1000 paths, you're at 1bn paths !!

> c) Identify all nodes(p) that do not have a causative agent of Strep (i.e., 
> nodes(q)) or Staph (nodes(h)) as follows:
> 
> with b,c,d,e,f MATCH (b),(c),(d),(e),(f) WHERE (b)--()-->(c) OR 
> (b)-->()-->(d) OR (b)-->()-->(e) OR (b)-->()-->(f) RETURN distinct b.Name;

you don't need the WITH or even the MATCH (b),(c),(d),(e),(f)

what connections are there between b and the other nodes ? do you have concrete 
ones? for the first there is also missing one direction.

the where clause can be a problem, in general you want to show that 

perhaps this query is better reproduced by a UNION of simpler matches

e.g

> MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- 
> (b:ObjectConcept)-->()-->(c:ObjectConcept{name:Strep}) RETURN b.name
UNION
> MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- 
> (b:ObjectConcept)-->()-->(e:ObjectConcept{name:Staph}) RETURN b.name
UNION
> MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- 
> (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Strep})
>  return b.name
UNION
> MATCH (a:ObjectConcept{Pneumonia}) <-[:ISA*]- 
> (b:ObjectConcept)-->()-->(d:ObjectConcept)-[:ISA*]->(c:ObjectConcept{name:Staph})
>  return b.name


another option would be to utilize the shortestPath() function to find one or 
all shortest path(s) between Pneumonia and the bacteria with certain rel-types 
and direction.

Perhaps you can share the dataset and the expected result.

Michael
> 
> The query returns the correct result, but runs for 20 min.  However, running 
> the same query without Strep or Staph concepts, the query returns a correct 
> result in < 0.5 seconds.  XOR operators also work when and/or results are 
> needed, but speed is still an issue.
> 
> I am new to cypher, but I am sure that there is a more efficient query for 
> this problem.  I am unsure if/how collections and lists could be employed and 
> improve run times.  Suggestions?
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Creating efficient multiple match queries?

Reply via email to