Tom,
please not that milestone releases are purely for preview / feedback usage, you
shouldn't attempt production like use with them. Just FYI.
Thanks a lot for your feedback, if the db created by the load-csv tasks works I
can have a look at your query performance later.
In general if you look at your query it is a global graph scan with a lot of
path explosions (as you can also see in the profile output)
> "_rows" : 478380,
> "_db_hits" : 478518
You had an index on :jurt(jurt_id) right?
But it shouldn't take that long.
It think the issue is the order of conditions and the cross-path condition. NOT
j1 = j2 which cypher is not good on yet,
that's why pulling j1 out as a precomputation allows cypher to faster filter
out paths during the matching.
Removing the labels from your path (as they are implied by the rels) would also
add a small speedup (one test less)
This should be faster:
> match (j1:jurt)
> where j1.jurt_id = {jurtid}
with j1
> match (j1)-[:HAS_TERM]->(t)<-[:HAS_TERM]-(j2)
> where j2 <> j1
> return j1.jurt_id,j2.jurt_id, count(t) as commonterms
> order by commonterms desc
> limit 3
Cheers,
Michael
----
(michael}-[:SUPPORTS]->(YOU)-[:USE]->(Neo4j)
Learn Online, Offline or Read a Book (in Deutsch)
We're trading T-shirts for cool GraphGist Models
Am 09.03.2014 um 14:31 schrieb Tom Zeppenfeldt <[email protected]>:
> I'm already stuck for a couple of days with a performance problem, or even
> queries that do not produce predictable results. I posted this one before,
> but now, after upgrading the machine from 2 to 8GB (hoping it was a lack of
> memory issue) , I don't know how to solve this.
>
> Environment :
> Cloudserver with : 8GB Ram 40GB SSD Disk Ubuntu 13.10 x64
> Version : Neo4j 2.1.0-M01
>
> Settings adapted from defaults
> in neo4j-wrapper.conf
> wrapper.java.initmemory=4096
> wrapper.java.maxmemory=4096
>
> in neo4j.properties
> neostore.nodestore.db.mapped_memory=50M
> neostore.relationshipstore.db.mapped_memory=100M
> neostore.propertystore.db.mapped_memory=180M
> neostore.propertystore.db.strings.mapped_memory=260M
> neostore.propertystore.db.arrays.mapped_memory=260M
>
> Server info:
> HeapMemoryUsage
> committed 4260102144
> init 4294967296
> max 4260102144
> used 302304640
>
> NonHeapMemoryUsage
> committed 64909312
> init 24313856
> max 136314880
> used 54561464
>
> Size of database
> Primitive count
> NumberOfRelationshipIdsInUse 1294128
> NumberOfNodeIdsInUse 55746
> NumberOfPropertyIdsInUse 55036
> NumberOfRelationshipTypeIdsInUse 11
>
> Model
>
> about 10k :jurt (= some kind of document) nodes and 12k :Term nodes, with
> about 1.2M (jurt)-[:HAS_TERM]->(term) rels
>
> Use
>
> Find similar jurts based on number of common terms, like this:
>
> match (j1:jurt)-[:HAS_TERM]->(t:Term)<-[:HAS_TERM]-(j2:jurt)
> where NOT j1=j2 AND j1.jurt_id = {jurtid}
> with j1,j2,count(t) as commonterms
> return j1.jurt_id,j2.jurt_id,commonterms
> order by commonterms desc
> limit 3
>
> the query above takes 5-6 secs, way too long I guess.
>
> Query plan
>
> {
> "columns" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
> "data" : [ [ "J70000", "J72191", 68 ], [ "J70000", "J73483", 67 ], [
> "J70000", "J72924", 66 ] ],
> "plan" : {
> "args" : {
> "returnItemNames" : [ "j1.jurt_id", "j2.jurt_id", "commonterms" ],
> "_rows" : 3,
> "_db_hits" : 0,
> "symKeys" : [ "j1", "j2.jurt_id", "j1.jurt_id", "commonterms", "j2" ]
> },
> "dbHits" : 0,
> "name" : "ColumnFilter",
> "children" : [ {
> "args" : {
> "limit" : "Literal(3)",
> "orderBy" : [ "SortItem(commonterms,false)" ],
> "_rows" : 3,
> "_db_hits" : 0
> },
> "dbHits" : 0,
> "name" : "Top",
> "children" : [ {
> "args" : {
> "_rows" : 9992,
> "_db_hits" : 19984,
> "exprKeys" : [ "j1.jurt_id", "j2.jurt_id" ],
> "symKeys" : [ "j1", "j2", "commonterms" ]
> },
> "dbHits" : 19984,
> "name" : "Extract",
> "children" : [ {
> "args" : {
> "returnItemNames" : [ "j1", "j2", "commonterms" ],
> "_rows" : 9992,
> "_db_hits" : 0,
> "symKeys" : [ "j1", "j2", "
> INTERNAL_AGGREGATE8b273443-699b-4262-8a48-41d7a316fa44" ]
> },
> "dbHits" : 0,
> "name" : "ColumnFilter",
> "children" : [ {
> "args" : {
> "keys" : [ "j1", "j2" ],
> "_rows" : 9992,
> "aggregates" : [ "(
> INTERNAL_AGGREGATE8b273443-699b-4262-8a48-41d7a316fa44,Count(t))" ],
> "_db_hits" : 0
> },
> "dbHits" : 0,
> "name" : "EagerAggregation",
> "children" : [ {
> "args" : {
> "_rows" : 478380,
> "_db_hits" : 0,
> "pred" : "(NOT(j1 == j2) AND hasLabel(j2:jurt(3)))"
> },
> "dbHits" : 0,
> "name" : "Filter",
> "children" : [ {
> "args" : {
> "start" : {
> "identifiers" : [ "j1" ],
> "query" : "{jurtid}",
> "producer" : "SchemaIndex",
> "property" : "jurt_id",
> "label" : "jurt"
> },
> "trail" : "(j1)-[ UNNAMED15:HAS_TERM WHERE
> (hasLabel(NodeIdentifier():Term(1)) AND hasLabel(NodeIdentifier():Term(1)))
> AND true]->(t)<-[ UNNAMED37:HAS_TERM WHERE
> hasLabel(NodeIdentifier():jurt(3)) AND true]-(j2)",
> "_rows" : 478380,
> "_db_hits" : 478518
> },
> "dbHits" : 478518,
> "name" : "TraversalMatcher",
> "children" : [ ],
> "rows" : 478380
> } ],
> "rows" : 478380
> } ],
> "rows" : 9992
> } ],
> "rows" : 9992
> } ],
> "rows" : 9992
> } ],
> "rows" : 3
> } ],
> "rows" : 3
> }
> }
>
>
>
> Every time I change the value of the {jurtid} param, the first run of the
> query returns different results then the subsequent runs
>
> When I provide two params, like this:
>
> match (j1:jurt)-[:HAS_TERM]->(t:Term)<-[:HAS_TERM]-(j2:jurt)
> where NOT j1=j2 AND ((j1.jurt_id = {id1}) OR (j1.jurt_id = {id2}))
> with j1,j2,count(t) as commonterms
> return j1.jurt_id,j2.jurt_id,commonterms
> order by commonterms desc limit 3
>
> it doesn't even return anything in the shell in the browser I get an "Unknown
> error"
>
>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.