Re: [Neo4j] Cypher Batch Update Taking Forver

Michael Hunger Tue, 20 Jan 2015 04:14:11 -0800

Hi,

sorry, you ran into this issue: 
http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ 
<http://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/>


see:

profile
LOAD CSV 
WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv"; 
AS row FIELDTERMINATOR '\t'
MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { 
name:row.`name:string:user2` })
MERGE (u1)-[r:Follows]->(u2);
ON CREATE SET r.Property=row.Property, r.FollowedSince=row.FollowedSince;

+----------------+------+--------+-------------+------------------------------------------------------+
|       Operator | Rows | DbHits | Identifiers |                                
                Other |
+----------------+------+--------+-------------+------------------------------------------------------+
|    EmptyResult |    0 |      0 |             |                                
                      |
|    UpdateGraph |    0 |      0 |   u1, u2, r |                                
         MergePattern |
|          Eager |    0 |      0 |             |                                
                      |
|      Filter(0) |    0 |      0 |             | Property(u2,name) == 
Property(row,name:string:user2) |
| NodeByLabel(0) |    0 |      0 |      u2, u2 |                                
                :USER |
|      Filter(1) |    0 |      0 |             |  Property(u1,name) == 
Property(row,name:string:user) |
| NodeByLabel(1) |    0 |    100 |      u1, u1 |                                
                :USER |
|        LoadCSV |  100 |      0 |         row |                                
                      |
+----------------+------+--------+-------------+------------------------------------------------------+

unfortunately both SET and even the more efficient ON CREATE SET run into this 
(fix due for 2.1.7)

if you change your query to this, it should work:

profile

LOAD CSV 
WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv"; 
AS row FIELDTERMINATOR '\t'
MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { 
name:row.`name:string:user2` })
MERGE (u1)-[r:Follows {Property:row.Property, 
FollowedSince:row.FollowedSince}]->(u2);

I would also consider changing the MERGE to a CREATE which is faster, if there 
is low chance of duplicates.
I'd probably also use a smaller periodic commit value but that should be tested.

You can test it on a subset of your dataset with:

USING PERIODIC COMMIT 10000
LOAD CSV 
WITH HEADERS FROM "https://dl.dropboxusercontent.com/u/14493611/100_rows.csv"; 
AS row FIELDTERMINATOR '\t'
WITH row limit 100000
MATCH (u1:USER { name:row.`name:string:user` }),(u2:USER { 
name:row.`name:string:user2` })
MERGE (u1)-[r:Follows {Property:row.Property, 
FollowedSince:row.FollowedSince}]->(u2);


+----------------+------+--------+-------------+------------------------------------------------------+
|       Operator | Rows | DbHits | Identifiers |                                
                Other |
+----------------+------+--------+-------------+------------------------------------------------------+
|    EmptyResult |    0 |      0 |             |                                
                      |
|    UpdateGraph |    0 |      0 |   u1, u2, r |                                
         MergePattern |
|      Filter(0) |    0 |      0 |             | Property(u2,name) == 
Property(row,name:string:user2) |
| NodeByLabel(0) |    0 |      0 |      u2, u2 |                                
                :USER |
|      Filter(1) |    0 |      0 |             |  Property(u1,name) == 
Property(row,name:string:user) |
| NodeByLabel(1) |    0 |    100 |      u1, u1 |                                
                :USER |
|        LoadCSV |  100 |      0 |         row |                                
                      |
+----------------+------+--------+-------------+------------------------------------------------------+


> Am 20.01.2015 um 03:49 schrieb straycat <[email protected]>:
> 
> I was doing a POC on publicly-available Twitter dataset for our project. I 
> was able to create the Neo4j database for it using Michael Hunter's Batch 
> Inserter utility, and it was relatively fast (It just took a 2h and 53 mins 
> to finish). All in all there were
> 15,203,731 Nodes, with 2 properties (name, url)
> 256,147,121 Relationships, with 1 property
> 
> Now I created a Cypher query to update the Twitter database. I added a new 
> property (Age) on the Node and a new property on the Relationship 
> (FollowedSince) in the CSVs. Now things start to look bad. The query to 
> update the relationship (see below) takes forever to run. 
> 
> USING PERIODIC COMMIT 10000
> LOAD CSV WITH HEADERS FROM {csvfile} AS row FIELDTERMINATOR '\t'
> MATCH (u1:USER {name:row.`name:string:user`}), (u2:USER 
> {name:row.`name:string:user2`})
> MERGE (u1)-[r:Follows]->(u2)
> SET r.Property=row.Property, r.FollowedSince=row.FollowedSince
> 
> I already pre-created the index by running 
> CREATE INDEX ON :USER(name); 
> 
> My neo4j property:
> 
> allow_store_upgrade=true
> dump_configuration=false
> cache_type=none
> use_memory_mapped_buffers=true
> neostore.propertystore.db.index.keys.mapped_memory=260M
> neostore.propertystore.db.index.mapped_memory=260M
> neostore.nodestore.db.mapped_memory=768M
> neostore.relationshipstore.db.mapped_memory=12G
> neostore.propertystore.db.mapped_memory=2048M
> neostore.propertystore.db.strings.mapped_memory=2048M
> neostore.propertystore.db.arrays.mapped_memory=260M
> 
> node_auto_indexing=true
> 
> I'd like to know what should I do to speed up my Cypher query? As of this 
> writing, it's more than an hour has passed and my Relationship (10,000,747) 
> is still hasn't finished. The Node (15,203,731) that finished earlier clocked 
> at 34 minutes which I think is way too long. The Batch Inserter utility 
> processed the whole Node in just 5 minutes! 
> 
> I did test my queries on a small dataset just to try it out first before 
> tackling bigger dataset, and it did work.
> 
> My other Neo4j informations:
> Neo4j version = 2.1.6
> OS = Centos 6.6
> Server RAM = 32 GB
> 
> Any advice please? Thanks.    
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Cypher Batch Update Taking Forver

Reply via email to