Hi Everyone,
I'm relatively new to neo4j and i'm running into slowness when trying to 
insert a batch of data. My strategy has been to write the batch of Cypher 
queries to a text file, and then pipe that into the neo4j-shell. Here is a 
description of my data.

Start with a single "user" node.
Create (if not exists) many "attribute" nodes.
Create relationships between "user" node and "attribute" nodes.

In my benchmarking, I'm creating 10,000 attribute nodes and relationships 
from the user to the attributes. The caveat is that the attribute nodes may 
already exist, and if it does I want to use the existing one instead of 
creating a new one. My current approach uses the MERGE command to create 
the attribute nodes (or return the node if it doesn't exist). My Cypher 
queries look something like this:

MERGE (a:Attribute {coordinate: '#{key}'}) WITH a 
MATCH (u:User {name: '#{user}'}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a)

Running 10,000 sequential queries like this to insert my data is quite 
slow. I'm getting somewhere around 20 inserts per second. Here are some 
things I've tried to optimize:

-Batch these into a large transaction in a text file, and pipe it into the 
neo4j-shell
-Batch these into a large single command in a text file, and pipe into 
neo4j-shell
-Break into parallel jobs and insert multi threaded. Each query must be a 
single transaction otherwise it locks.
-Separate the MERGE commands into a batch, and the CREATE relationship 
commands into a separate batch

I've done the tooling benchmark to test file system performance 
(http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html) 
and my results are great. I should be able to get upwards of 70k 
records/sec based on the benchmark.

Can anyone advise what is the best strategy to import this type of data 
quickly?


-- 


This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to