I'm currently running a test with Neo4j CE 2.3.1 on a Windows 7 machine 
with 4GB memory and trying to understand how to manage memory allocation 
when importing from CSV using the Neo4jShell.

I am running these two commands, the first one to create the nodes and the 
second one to create edges (one edge for each node).

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line
CREATE (:EVENT { eventID: line.eventID, name: line.name, referrer: 
line.referrer, sessionID: toInt(line.sessionID), timestamp: 
toInt(line.timestamp), pID: toInt(line.pID)});

USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line 
MATCH (f:Feature)
WHERE f.name = line.name
MATCH (e:EVENT) 
WHERE e.eventID = line.eventID
MERGE (e)-[:FOR]->(f);

I have the following related indexes and constraints:

Indexes                                                          
  ON :EVENT(eventID) ONLINE (for uniqueness constraint) 
  ON :Feature(name)  ONLINE (for uniqueness constraint) 

Constraints
  ON (feature:Feature) ASSERT feature.name IS UNIQUE
  ON (event:EVENT) ASSERT event.eventID IS UNIQUE

When I have 5 million nodes in the db and try to load a CSV that has 
another 5 million nodes, it takes about 15 minutes to complete and gets to 
~1.5GB memory usage. If I immediately run the second command to create the 
edges, the memory starts going up again and sometimes it will stall at some 
point. In order to make sure the second command works I have to restart 
Neo4j. 

I'm trying to understand if I can improve this by optimizing the commands 
somehow, or if specifying memory settings in the properties file might 
help...in which case how best to go about that?

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to