Hi Johan,
good news, the Cypher team addressed the last issue in Neo4j 2.1.3 and now
I can load all 12M nodes and 12M rels in one go,
please give it a try:
Here is my test.
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///home/michael/import/batch-import/rels.csv' AS line
FIELDTERMINATOR '\t'
WITH line
SKIP 1
WITH distinct line[0] as name
CREATE (:Node {name:name});
CREATE CONSTRAINT ON (n:Node) ASSERT n.name IS UNIQUE;
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///home/michael/import/batch-import/rels.csv' AS line
FIELDTERMINATOR '\t'
WITH line
SKIP 1
WITH distinct line[1] as name
MERGE (:Node {name:name});
USING PERIODIC COMMIT 50000
LOAD CSV FROM 'file:///home/michael/import/batch-import/rels.csv' AS line
FIELDTERMINATOR '\t'
WITH line
SKIP 1
MATCH (node1:Node {name:line[0]})
MATCH (node2:Node {name:line[1]})
CREATE (node1)-[:REL {count:toInt(line[4])}]->(node2);
On Sun, Jul 27, 2014 at 1:28 AM, Peter Neubauer <
[email protected]> wrote:
> Hi there,
> I have a similarly big graph at Mapillary right now, and am importing it
> transactionally. I stack up about 20000 small transactions that do lookups
> and create relationships, and commit them after that interval, works great
> with 10g RAM for the JVM, 35M relationships take about 2h to import that
> way.
>
> Make sure to run VisualVM or Java8 Flight Recorder to find out what is
> taking the heap and CPU time, it might be parsing or big Hashmaps somewhere.
>
> HTH
>
> /peter
>
>
> G: neubauer.peter
> S: peter.neubauer
> P: +46 704 106975
> L: http://www.linkedin.com/in/neubauer
> T: @peterneubauer <http://twitter.com/peterneubauer>
>
> Open Data - @mapillary <http://mapillary.com/>
> Open Source - @neo4j <http://neo4j.org/>
> Open Future - @coderdojo <http://malmo.coderdojo.se/>
>
>
> On Thu, Jul 24, 2014 at 5:03 PM, Johann Petrak <[email protected]>
> wrote:
>
>> (if TL,DR skip to the last paragrahp :) )
>> The following is all for neo4j version 2.1.2 running under Linux, 64bit:
>> I try to create a neo4j database by importing from a CSV file. The CSV
>> file has three columns: node1Name, node2Name, count.
>> I run
>> neo4j-shell -path mydatabase
>> and execute the following commands:
>>
>> CREATE CONSTRAINT ON (node:Node) ASSERT node.name IS UNIQUE;
>>
>> USING PERIODIC COMMIT 500
>> LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line
>> MERGE (node1:Node {name:line[0]})
>> MERGE (node2:Node {name:line[1]})
>> CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2);
>>
>> The CSV file contains about 12 million lines and there are about 5
>> million different nodes, most nodes with just 1 or a few relations between
>> them, and only a couple 1000 nodes with more than 1000 or 10000 relations.
>>
>> I run the neo4j-shell command with 8G of maximum heap memory on a machine
>> with 2 cores and 16GB RAM in total. The odd thing is that the required heap
>> memory seems to go up linearly over time and eventually the process aborts
>> with an out of memory condition.
>> Another attempt on a larger machine, with 20G max heapsize and 16 cores
>> shows exactly the same behavior on the jconsole. On that machone, the
>> memory usage oscillates initially around 2G, then over the course of the
>> next hour or so goes slowly up to oscillate around 12G. Then there is a
>> sharp rise and all of the 20G are consumed, without any chance to re-claim
>> any of it!
>>
>> What could the reason for that be? I cannot imagine why simply adding
>> nodes and relations like this will cause a constant accumulation of
>> non-reclamable heap memory?
>>
>> UPDATE: while writing this I also tried to split the CSV up into smaller
>> chunks (500000 rows): when I load each chunk during the same neo4j-shell
>> session, the amount of memory that cannot get re-claimed goes up in the
>> same way and eventually I get an out of memory exception, even with 20G.
>> However, loading each chunk in a separate neo4j-shell session works just
>> fine and never uses more than 4G of memory.
>> This seems to indicate that there may be some memory leak somewhere,
>> because that memory does not actually seem to be necessary after all to
>> create the graph?
>>
>> johann
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.