(if TL,DR skip to the last paragrahp :) )
The following is all for neo4j version 2.1.2 running under Linux, 64bit:
I try to create a neo4j database by importing from a CSV file. The CSV file
has three columns: node1Name, node2Name, count.
I run
neo4j-shell -path mydatabase
and execute the following commands:
CREATE CONSTRAINT ON (node:Node) ASSERT node.name IS UNIQUE;
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line
MERGE (node1:Node {name:line[0]})
MERGE (node2:Node {name:line[1]})
CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2);
The CSV file contains about 12 million lines and there are about 5 million
different nodes, most nodes with just 1 or a few relations between them,
and only a couple 1000 nodes with more than 1000 or 10000 relations.
I run the neo4j-shell command with 8G of maximum heap memory on a machine
with 2 cores and 16GB RAM in total. The odd thing is that the required heap
memory seems to go up linearly over time and eventually the process aborts
with an out of memory condition.
Another attempt on a larger machine, with 20G max heapsize and 16 cores
shows exactly the same behavior on the jconsole. On that machone, the
memory usage oscillates initially around 2G, then over the course of the
next hour or so goes slowly up to oscillate around 12G. Then there is a
sharp rise and all of the 20G are consumed, without any chance to re-claim
any of it!
What could the reason for that be? I cannot imagine why simply adding nodes
and relations like this will cause a constant accumulation of
non-reclamable heap memory?
UPDATE: while writing this I also tried to split the CSV up into smaller
chunks (500000 rows): when I load each chunk during the same neo4j-shell
session, the amount of memory that cannot get re-claimed goes up in the
same way and eventually I get an out of memory exception, even with 20G.
However, loading each chunk in a separate neo4j-shell session works just
fine and never uses more than 4G of memory.
This seems to indicate that there may be some memory leak somewhere,
because that memory does not actually seem to be necessary after all to
create the graph?
johann
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.