Hi Johann,
The problem is when you use the combination of MATCH or MERGE + CREATE it
does all the MATCH clauses up front in order to stop getting into a
recursive loop. I think this will be addressed in the new cypher compiler
which is currently being worked on.
For now a work around could be to do something like this:
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line
SKIP 0 LIMIT 500000
MERGE (node1:Node {name:line[0]})
MERGE (node2:Node {name:line[1]})
CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2);
USING PERIODIC COMMIT 500
LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line
SKIP 500000 LIMIT 500000
MERGE (node1:Node {name:line[0]})
MERGE (node2:Node {name:line[1]})
CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2);
Hope that helps.
Mark
On 24 July 2014 16:03, Johann Petrak <[email protected]> wrote:
> (if TL,DR skip to the last paragrahp :) )
> The following is all for neo4j version 2.1.2 running under Linux, 64bit:
> I try to create a neo4j database by importing from a CSV file. The CSV
> file has three columns: node1Name, node2Name, count.
> I run
> neo4j-shell -path mydatabase
> and execute the following commands:
>
> CREATE CONSTRAINT ON (node:Node) ASSERT node.name IS UNIQUE;
>
> USING PERIODIC COMMIT 500
> LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line
> MERGE (node1:Node {name:line[0]})
> MERGE (node2:Node {name:line[1]})
> CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2);
>
> The CSV file contains about 12 million lines and there are about 5 million
> different nodes, most nodes with just 1 or a few relations between them,
> and only a couple 1000 nodes with more than 1000 or 10000 relations.
>
> I run the neo4j-shell command with 8G of maximum heap memory on a machine
> with 2 cores and 16GB RAM in total. The odd thing is that the required heap
> memory seems to go up linearly over time and eventually the process aborts
> with an out of memory condition.
> Another attempt on a larger machine, with 20G max heapsize and 16 cores
> shows exactly the same behavior on the jconsole. On that machone, the
> memory usage oscillates initially around 2G, then over the course of the
> next hour or so goes slowly up to oscillate around 12G. Then there is a
> sharp rise and all of the 20G are consumed, without any chance to re-claim
> any of it!
>
> What could the reason for that be? I cannot imagine why simply adding
> nodes and relations like this will cause a constant accumulation of
> non-reclamable heap memory?
>
> UPDATE: while writing this I also tried to split the CSV up into smaller
> chunks (500000 rows): when I load each chunk during the same neo4j-shell
> session, the amount of memory that cannot get re-claimed goes up in the
> same way and eventually I get an out of memory exception, even with 20G.
> However, loading each chunk in a separate neo4j-shell session works just
> fine and never uses more than 4G of memory.
> This seems to indicate that there may be some memory leak somewhere,
> because that memory does not actually seem to be necessary after all to
> create the graph?
>
> johann
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.