I was doing a POC on publicly-available Twitter dataset for our project. I
was able to create the Neo4j database for it using Michael Hunter's Batch
Inserter utility, and it was relatively fast (It just took a 2h and 53 mins
to finish). All in all there were
15,203,731 Nodes, with 2 properties (name, url)
256,147,121 Relationships, with 1 property
Now I created a Cypher query to update the Twitter database. I added a new
property (Age) on the Node and a new property on the Relationship
(FollowedSince) in the CSVs. Now things start to look bad. The query to
update the relationship (see below) takes forever to run.
USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM {csvfile} AS row FIELDTERMINATOR '\t'
MATCH (u1:USER {name:row.`name:string:user`}), (u2:USER
{name:row.`name:string:user2`})
MERGE (u1)-[r:Follows]->(u2)
SET r.Property=row.Property, r.FollowedSince=row.FollowedSince
I already pre-created the index by running
CREATE INDEX ON :USER(name);
My neo4j property:
allow_store_upgrade=true
dump_configuration=false
cache_type=none
use_memory_mapped_buffers=true
neostore.propertystore.db.index.keys.mapped_memory=260M
neostore.propertystore.db.index.mapped_memory=260M
neostore.nodestore.db.mapped_memory=768M
neostore.relationshipstore.db.mapped_memory=12G
neostore.propertystore.db.mapped_memory=2048M
neostore.propertystore.db.strings.mapped_memory=2048M
neostore.propertystore.db.arrays.mapped_memory=260M
node_auto_indexing=true
I'd like to know what should I do to speed up my Cypher query? As of this
writing, it's more than an hour has passed and my Relationship (10,000,747)
is still hasn't finished. The Node (15,203,731) that finished earlier
clocked at 34 minutes which I think is way too long. The Batch Inserter
utility processed the whole Node in just 5 minutes!
I did test my queries on a small dataset just to try it out first before
tackling bigger dataset, and it did work.
My other Neo4j informations:
Neo4j version = 2.1.6
OS = Centos 6.6
Server RAM = 32 GB
Any advice please? Thanks.
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.