I'm currently running a test with Neo4j CE 2.3.1 on a Windows 7 machine
with 4GB memory and trying to understand how to manage memory allocation
when importing from CSV using the Neo4jShell.
I am running these two commands, the first one to create the nodes and the
second one to create edges (one edge for each node).
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line
CREATE (:EVENT { eventID: line.eventID, name: line.name, referrer:
line.referrer, sessionID: toInt(line.sessionID), timestamp:
toInt(line.timestamp), pID: toInt(line.pID)});
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line
MATCH (f:Feature)
WHERE f.name = line.name
MATCH (e:EVENT)
WHERE e.eventID = line.eventID
MERGE (e)-[:FOR]->(f);
I have the following related indexes and constraints:
Indexes
ON :EVENT(eventID) ONLINE (for uniqueness constraint)
ON :Feature(name) ONLINE (for uniqueness constraint)
Constraints
ON (feature:Feature) ASSERT feature.name IS UNIQUE
ON (event:EVENT) ASSERT event.eventID IS UNIQUE
When I have 5 million nodes in the db and try to load a CSV that has
another 5 million nodes, it takes about 15 minutes to complete and gets to
~1.5GB memory usage. If I immediately run the second command to create the
edges, the memory starts going up again and sometimes it will stall at some
point. In order to make sure the second command works I have to restart
Neo4j.
I'm trying to understand if I can improve this by optimizing the commands
somehow, or if specifying memory settings in the properties file might
help...in which case how best to go about that?
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.