Tom,

please not that milestone releases are purely for preview / feedback usage, you 
shouldn't attempt production like use with them. Just FYI.

Thanks a lot for your feedback, if the db created by the load-csv tasks works I 
can have a look at your query performance later.

In general if you look at your query it is a global graph scan with a lot of 
path explosions (as you can also see in the profile output)
>                   "_rows" : 478380,
>                   "_db_hits" : 478518


You had an index on :jurt(jurt_id) right?

But it shouldn't take that long. 

It think the issue is the order of conditions and the cross-path condition. NOT 
j1 = j2 which cypher is not good on yet,
that's why pulling j1 out as a precomputation allows cypher to faster filter 
out paths during the matching.

Removing the labels from your path (as they are implied by the rels) would also 
add a small speedup (one test less)

This should be faster:

> match (j1:jurt)

> where j1.jurt_id = {jurtid}
with j1
> match (j1)-[:HAS_TERM]->(t)<-[:HAS_TERM]-(j2)
> where j2 <> j1
> return j1.jurt_id,j2.jurt_id, count(t) as commonterms
> order by commonterms desc
> limit 3


Cheers,

Michael

----
(michael}-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
Learn Online, Offline or Read a Book (in Deutsch)
We're trading T-shirts for cool GraphGist Models





Am 09.03.2014 um 14:31 schrieb Tom Zeppenfeldt <[email protected]>:

> I'm already stuck for a couple of days with a performance problem, or even 
> queries that do not produce predictable results. I posted this one before, 
> but now, after upgrading the machine from 2 to 8GB (hoping it was a lack of 
> memory issue) , I don't know how to solve this. 
> 
> Environment  : 
> Cloudserver with : 8GB Ram 40GB SSD Disk Ubuntu 13.10 x64
> Version : Neo4j 2.1.0-M01
> 
> Settings adapted from defaults
> in neo4j-wrapper.conf
> wrapper.java.initmemory=4096
> wrapper.java.maxmemory=4096
> 
> in neo4j.properties
> neostore.nodestore.db.mapped_memory=50M
> neostore.relationshipstore.db.mapped_memory=100M
> neostore.propertystore.db.mapped_memory=180M
> neostore.propertystore.db.strings.mapped_memory=260M
> neostore.propertystore.db.arrays.mapped_memory=260M
> 
> Server info:
> HeapMemoryUsage
> committed 4260102144
> init 4294967296
> max 4260102144
> used 302304640
> 
> NonHeapMemoryUsage
> committed 64909312
> init 24313856
> max 136314880
> used 54561464
> 
> Size of database
> Primitive count
> NumberOfRelationshipIdsInUse 1294128
> NumberOfNodeIdsInUse 55746
> NumberOfPropertyIdsInUse 55036
> NumberOfRelationshipTypeIdsInUse 11
> 
> Model
> 
> about 10k :jurt  (= some kind of document) nodes and 12k :Term nodes, with 
> about 1.2M (jurt)-[:HAS_TERM]->(term) rels
> 
> Use
> 
> Find similar jurts based on number of common terms, like this:
> 
> match (j1:jurt)-[:HAS_TERM]->(t:Term)<-[:HAS_TERM]-(j2:jurt)
> where NOT j1=j2 AND j1.jurt_id = {jurtid}
> with j1,j2,count(t) as commonterms
> return j1.jurt_id,j2.jurt_id,commonterms
> order by commonterms desc
> limit 3
> 
> the query above takes  5-6 secs, way too long I guess.  
> 
> Query plan
> 
> {
>   "columns" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
>   "data" : [ [ "J70000", "J72191", 68 ], [ "J70000", "J73483", 67 ], [ 
> "J70000", "J72924", 66 ] ],
>   "plan" : {
>     "args" : {
>       "returnItemNames" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
>       "_rows" : 3,
>       "_db_hits" : 0,
>       "symKeys" : [ "j1", "j2.jurt_id", "j1.jurt_id", "commonterms", "j2" ]
>     },
>     "dbHits" : 0,
>     "name" : "ColumnFilter",
>     "children" : [ {
>       "args" : {
>         "limit" : "Literal(3)",
>         "orderBy" : [ "SortItem(commonterms,false)" ],
>         "_rows" : 3,
>         "_db_hits" : 0
>       },
>       "dbHits" : 0,
>       "name" : "Top",
>       "children" : [ {
>         "args" : {
>           "_rows" : 9992,
>           "_db_hits" : 19984,
>           "exprKeys" : [ "j1.jurt_id", "j2.jurt_id" ],
>           "symKeys" : [ "j1", "j2", "commonterms" ]
>         },
>         "dbHits" : 19984,
>         "name" : "Extract",
>         "children" : [ {
>           "args" : {
>             "returnItemNames" : [ "j1", "j2", "commonterms" ],
>             "_rows" : 9992,
>             "_db_hits" : 0,
>             "symKeys" : [ "j1", "j2", "  
> INTERNAL_AGGREGATE8b273443-699b-4262-8a48-41d7a316fa44" ]
>           },
>           "dbHits" : 0,
>           "name" : "ColumnFilter",
>           "children" : [ {
>             "args" : {
>               "keys" : [ "j1", "j2" ],
>               "_rows" : 9992,
>               "aggregates" : [ "(  
> INTERNAL_AGGREGATE8b273443-699b-4262-8a48-41d7a316fa44,Count(t))" ],
>               "_db_hits" : 0
>             },
>             "dbHits" : 0,
>             "name" : "EagerAggregation",
>             "children" : [ {
>               "args" : {
>                 "_rows" : 478380,
>                 "_db_hits" : 0,
>                 "pred" : "(NOT(j1 == j2) AND hasLabel(j2:jurt(3)))"
>               },
>               "dbHits" : 0,
>               "name" : "Filter",
>               "children" : [ {
>                 "args" : {
>                   "start" : {
>                     "identifiers" : [ "j1" ],
>                     "query" : "{jurtid}",
>                     "producer" : "SchemaIndex",
>                     "property" : "jurt_id",
>                     "label" : "jurt"
>                   },
>                   "trail" : "(j1)-[  UNNAMED15:HAS_TERM WHERE 
> (hasLabel(NodeIdentifier():Term(1)) AND hasLabel(NodeIdentifier():Term(1))) 
> AND true]->(t)<-[  UNNAMED37:HAS_TERM WHERE 
> hasLabel(NodeIdentifier():jurt(3)) AND true]-(j2)",
>                   "_rows" : 478380,
>                   "_db_hits" : 478518
>                 },
>                 "dbHits" : 478518,
>                 "name" : "TraversalMatcher",
>                 "children" : [ ],
>                 "rows" : 478380
>               } ],
>               "rows" : 478380
>             } ],
>             "rows" : 9992
>           } ],
>           "rows" : 9992
>         } ],
>         "rows" : 9992
>       } ],
>       "rows" : 3
>     } ],
>     "rows" : 3
>   }
> }
> 
> 
> 
> Every time I change the value of the  {jurtid}  param, the first run of the 
> query returns different results then the subsequent runs
> 
> When I provide two params, like this:
> 
> match (j1:jurt)-[:HAS_TERM]->(t:Term)<-[:HAS_TERM]-(j2:jurt) 
> where NOT j1=j2 AND ((j1.jurt_id = {id1}) OR (j1.jurt_id = {id2})) 
> with j1,j2,count(t) as commonterms 
> return j1.jurt_id,j2.jurt_id,commonterms 
> order by commonterms desc limit 3
> 
> it doesn't even return anything in the shell in the browser I get an "Unknown 
> error"
> 
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to