Perhaps it makes more sense to handle your subtypes with labels instead?
And I'd love to see a picture :)
NEGATION is alway tricky to handle.
how many paths does this return?
> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
and how many this?
> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
would perhaps this work too?
> start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> with collect(be1) as not_head_nodes, collect(be2) as not_last_nodes,
> start ds1=node:genNodeIdx('subtype:type1')
> match
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]->(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> where be1 not in not_head_nodes AND be2 not in not_last_nodes
> return be1.identifier, be2.identifier
Am 19.05.2014 um 18:33 schrieb Michael Miller <[email protected]>:
> hi all,
>
> we are running into a similar problem, also in the bio research space (not
> surprising). although the graph has lots of aspects to it, at the heart we
> have Bioentity nodes (genes, proteins, etc) that have a BELONG_IN
> relationship to AnalysisSetSlice nodes and the AnalysisSetSlice nodes have a
> DATA relationship between them and the AnalysisSetSlice nodes have a subtype
> property such that only AnalysisSetSlice nodes with the same subtype property
> can have a DATA relationship between them. a Bioentity node might BELONG_IN
> 1-4 AnalysisSetSlice nodes and AnalysisSetSlice nodes might have DATA to 32
> other AnalysisSetSlice nodes, . when we know what Bioentities we are
> interested in, then the queries are relatively quick but when we want to
> partition using the DATA relationship on AnalysisSetSlice.subtype is when the
> query never returns (at least after running for two days) even tho the box is
> not using all the memory and logs aren't showing any particular problem. For
> a smaller test graph, the query does return with the correct nodes.
>
> there are ~20,000 Bioentities, ~80,000 AnalysisSetSlices divided into 4
> subtypes of ~20,000 nodes each. there are ~2,500,000 DATA relationships so
> ~32 per analysisSetSlice nodes. we're using 2.0.1, the java embedded
> database and issuing cypher queries. the query we ran was to discover what
> pairs of Bioentity nodes only had a relationship through DATA in one subtype.
> here's the query i came up with:
> "start ds1=node:genNodeIdx('subtype:type2 OR subtype:type3 OR subtype:type4')
> match
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]-(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> with collect(distinct [be1, be2]) as not_paths
> start ds1=node:genNodeIdx('subtype:type1')
> match
> (be1:Bioentity)-[:BELONG_IN]-(ds1:AnalysisSetSlice)-[:DATA]->(ds2:AnalysisSetSlice)-[:BELONG_IN]-(be2:Bioentity)
> where none(not_path in not_paths where be1 = head(not_path) and be2 =
> last(not_path))
> return be1.identifier, be2.identifier"
> the problem, it looks like, is that this is 'order n squared', not ideal.
> what i wanted was the graph equivalent of the relational MINUS operator, i
> think. when i remove the where clause, that query doesn't take long at all.
> is there a better way to formulate this query?
>
> thanks,
> michael
>
> On Sunday, May 18, 2014 9:06:29 AM UTC-7, Alex Frieden wrote:
> Hi guys,
> My group is starting to get into pretty large datasets. I was wondering if
> users can take about their large datasets and how they handled dealing with.
> By large I am talking about a neo4j database over 1TB. However, any stories
> of scaling data would be useful. Thanks!
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.