Hi,

I am importing data into Neo4j using the batch importer CLI. It has about a 
billion nodes and roughly 2B relationships. The nodes seemed to have loaded 
in an hour (if I am reading the output correctly) but now it is doing a 
node index and has been doing so for > 12 hours.

Nodes
[>:23.39 MB/s---|PROPERTIE|NODE:|LAB|*v:37.18 
MB/s---------------------------------------------]  1B
Done in 1h 7m 18s 54ms
Prepare node index
[*SORT:11.52 
GB--------------------------------------------------------------------------------]881M

Any idea why it is so slow? I don't think it is meant to be? I can think of 
a few things to speed up (which I am trying in another instance) but want 
to get feedback from folks:
1. Split up the creation of the nodes and relationships into 2 separate 
commands
2. Create indexes after the creation of the nodes
3. Do match/merge to get rid of duplicates
4. Run the import for relationships.

Do you think this will help? If not, what am I doing wrong?
Thanks
Neha.

PS: My heap size is large(ish) -- around 25G. Is that maybe an issue?

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to