Hi, sorry, you ran into this issue: http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ <http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/>
see: profile LOAD CSV WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv" AS row FIELDTERMINATOR '\t' MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { name:row.`name:string:user2` }) MERGE (u1)-[r:Follows]->(u2); ON CREATE SET r.Property=row.Property, r.FollowedSince=row.FollowedSince; +----------------+------+--------+-------------+------------------------------------------------------+ | Operator | Rows | DbHits | Identifiers | Other | +----------------+------+--------+-------------+------------------------------------------------------+ | EmptyResult | 0 | 0 | | | | UpdateGraph | 0 | 0 | u1, u2, r | MergePattern | | Eager | 0 | 0 | | | | Filter(0) | 0 | 0 | | Property(u2,name) == Property(row,name:string:user2) | | NodeByLabel(0) | 0 | 0 | u2, u2 | :USER | | Filter(1) | 0 | 0 | | Property(u1,name) == Property(row,name:string:user) | | NodeByLabel(1) | 0 | 100 | u1, u1 | :USER | | LoadCSV | 100 | 0 | row | | +----------------+------+--------+-------------+------------------------------------------------------+ unfortunately both SET and even the more efficient ON CREATE SET run into this (fix due for 2.1.7) if you change your query to this, it should work: profile LOAD CSV WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv" AS row FIELDTERMINATOR '\t' MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { name:row.`name:string:user2` }) MERGE (u1)-[r:Follows {Property:row.Property, FollowedSince:row.FollowedSince}]->(u2); I would also consider changing the MERGE to a CREATE which is faster, if there is low chance of duplicates. I'd probably also use a smaller periodic commit value but that should be tested. You can test it on a subset of your dataset with: USING PERIODIC COMMIT 10000 LOAD CSV WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv" AS row FIELDTERMINATOR '\t' WITH row limit 100000 MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { name:row.`name:string:user2` }) MERGE (u1)-[r:Follows {Property:row.Property, FollowedSince:row.FollowedSince}]->(u2); +----------------+------+--------+-------------+------------------------------------------------------+ | Operator | Rows | DbHits | Identifiers | Other | +----------------+------+--------+-------------+------------------------------------------------------+ | EmptyResult | 0 | 0 | | | | UpdateGraph | 0 | 0 | u1, u2, r | MergePattern | | Filter(0) | 0 | 0 | | Property(u2,name) == Property(row,name:string:user2) | | NodeByLabel(0) | 0 | 0 | u2, u2 | :USER | | Filter(1) | 0 | 0 | | Property(u1,name) == Property(row,name:string:user) | | NodeByLabel(1) | 0 | 100 | u1, u1 | :USER | | LoadCSV | 100 | 0 | row | | +----------------+------+--------+-------------+------------------------------------------------------+ > Am 20.01.2015 um 03:49 schrieb straycat <[email protected]>: > > I was doing a POC on publicly-available Twitter dataset for our project. I > was able to create the Neo4j database for it using Michael Hunter's Batch > Inserter utility, and it was relatively fast (It just took a 2h and 53 mins > to finish). All in all there were > 15,203,731 Nodes, with 2 properties (name, url) > 256,147,121 Relationships, with 1 property > > Now I created a Cypher query to update the Twitter database. I added a new > property (Age) on the Node and a new property on the Relationship > (FollowedSince) in the CSVs. Now things start to look bad. The query to > update the relationship (see below) takes forever to run. > > USING PERIODIC COMMIT 10000 > LOAD CSV WITH HEADERS FROM {csvfile} AS row FIELDTERMINATOR '\t' > MATCH (u1:USER {name:row.`name:string:user`}), (u2:USER > {name:row.`name:string:user2`}) > MERGE (u1)-[r:Follows]->(u2) > SET r.Property=row.Property, r.FollowedSince=row.FollowedSince > > I already pre-created the index by running > CREATE INDEX ON :USER(name); > > My neo4j property: > > allow_store_upgrade=true > dump_configuration=false > cache_type=none > use_memory_mapped_buffers=true > neostore.propertystore.db.index.keys.mapped_memory=260M > neostore.propertystore.db.index.mapped_memory=260M > neostore.nodestore.db.mapped_memory=768M > neostore.relationshipstore.db.mapped_memory=12G > neostore.propertystore.db.mapped_memory=2048M > neostore.propertystore.db.strings.mapped_memory=2048M > neostore.propertystore.db.arrays.mapped_memory=260M > > node_auto_indexing=true > > I'd like to know what should I do to speed up my Cypher query? As of this > writing, it's more than an hour has passed and my Relationship (10,000,747) > is still hasn't finished. The Node (15,203,731) that finished earlier clocked > at 34 minutes which I think is way too long. The Batch Inserter utility > processed the whole Node in just 5 minutes! > > I did test my queries on a small dataset just to try it out first before > tackling bigger dataset, and it did work. > > My other Neo4j informations: > Neo4j version = 2.1.6 > OS = Centos 6.6 > Server RAM = 32 GB > > Any advice please? Thanks. > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
