Re: [Neo4j] large datasets

Michael Hunger Tue, 10 Jun 2014 10:32:32 -0700

Perhaps it makes more sense to handle your subtypes with labels instead?

And I'd love to see a picture :)


NEGATION is alway tricky to handle. 

how many paths does this return?
> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match 
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)

and how many this?
> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')


would perhaps this work too?

> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match 
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> with collect(be1) as not_head_nodes, collect(be2) as not_last_nodes,
> start ds1=node:genNodeIdx('subtype:type1')
> match 
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]->(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> where be1 not in not_head_nodes AND be2 not in not_last_nodes
> return be1.identifier, be2.identifier
Am 19.05.2014 um 18:33 schrieb Michael Miller <[email protected]>:

> hi all,
>  
> we are running into a similar problem, also in the bio research space (not 
> surprising).  although the graph has lots of aspects to it, at the heart we 
> have Bioentity nodes (genes, proteins, etc) that have a BELONG_IN 
> relationship to AnalysisSetSlice nodes and the AnalysisSetSlice nodes have a 
> DATA relationship between them and the AnalysisSetSlice nodes have a subtype 
> property such that only AnalysisSetSlice nodes with the same subtype property 
> can have a DATA relationship between them.  a Bioentity node might BELONG_IN 
> 1-4 AnalysisSetSlice nodes and AnalysisSetSlice nodes might have DATA to 32 
> other AnalysisSetSlice nodes, .  when we know what Bioentities we are 
> interested in, then the queries are relatively quick but when we want to 
> partition using the DATA relationship on AnalysisSetSlice.subtype is when the 
> query never returns (at least after running for two days) even tho the box is 
> not using all the memory and logs aren't showing any particular problem.  For 
> a smaller test graph, the query does return with the correct nodes.
>  
> there are ~20,000 Bioentities, ~80,000 AnalysisSetSlices divided into 4 
> subtypes of ~20,000 nodes  each.  there are ~2,500,000 DATA relationships so 
> ~32 per analysisSetSlice nodes.  we're using 2.0.1, the java embedded 
> database and issuing cypher queries. the query we ran was to discover what 
> pairs of Bioentity nodes only had a relationship through DATA in one subtype. 
>  here's the query i came up with:
> "start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match 
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> with collect(distinct [be1, be2]) as not_paths
> start ds1=node:genNodeIdx('subtype:type1')
> match 
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]->(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> where none(not_path in not_paths where be1 = head(not_path) and be2 = 
> last(not_path))
> return be1.identifier, be2.identifier"
> the problem, it looks like, is that this is 'order n squared', not ideal.  
> what i wanted was the graph equivalent of the relational MINUS operator, i 
> think.  when i remove the where clause, that query doesn't take long at all.  
> is there a better way to formulate this query?
>  
> thanks,
> michael
>  
> On Sunday, May 18, 2014 9:06:29 AM UTC-7, Alex Frieden wrote:
> Hi guys,
> My group is starting to get into pretty large datasets.  I was wondering if 
> users can take about their large datasets and how they handled dealing with.  
> By large I am talking about a neo4j database over 1TB.  However, any stories 
> of scaling data would be useful.  Thanks!
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] large datasets

Reply via email to