Hi there, I have a similarly big graph at Mapillary right now, and am importing it transactionally. I stack up about 20000 small transactions that do lookups and create relationships, and commit them after that interval, works great with 10g RAM for the JVM, 35M relationships take about 2h to import that way.
Make sure to run VisualVM or Java8 Flight Recorder to find out what is taking the heap and CPU time, it might be parsing or big Hashmaps somewhere. HTH /peter G: neubauer.peter S: peter.neubauer P: +46 704 106975 L: http://www.linkedin.com/in/neubauer T: @peterneubauer <http://twitter.com/peterneubauer> Open Data - @mapillary <http://mapillary.com/> Open Source - @neo4j <http://neo4j.org/> Open Future - @coderdojo <http://malmo.coderdojo.se/> On Thu, Jul 24, 2014 at 5:03 PM, Johann Petrak <[email protected]> wrote: > (if TL,DR skip to the last paragrahp :) ) > The following is all for neo4j version 2.1.2 running under Linux, 64bit: > I try to create a neo4j database by importing from a CSV file. The CSV > file has three columns: node1Name, node2Name, count. > I run > neo4j-shell -path mydatabase > and execute the following commands: > > CREATE CONSTRAINT ON (node:Node) ASSERT node.name IS UNIQUE; > > USING PERIODIC COMMIT 500 > LOAD CSV FROM 'file:/where/my/csv/file/is.csv' AS line > MERGE (node1:Node {name:line[0]}) > MERGE (node2:Node {name:line[1]}) > CREATE (node1)-[:REL {count:toInt(line[2])}]->(node2); > > The CSV file contains about 12 million lines and there are about 5 million > different nodes, most nodes with just 1 or a few relations between them, > and only a couple 1000 nodes with more than 1000 or 10000 relations. > > I run the neo4j-shell command with 8G of maximum heap memory on a machine > with 2 cores and 16GB RAM in total. The odd thing is that the required heap > memory seems to go up linearly over time and eventually the process aborts > with an out of memory condition. > Another attempt on a larger machine, with 20G max heapsize and 16 cores > shows exactly the same behavior on the jconsole. On that machone, the > memory usage oscillates initially around 2G, then over the course of the > next hour or so goes slowly up to oscillate around 12G. Then there is a > sharp rise and all of the 20G are consumed, without any chance to re-claim > any of it! > > What could the reason for that be? I cannot imagine why simply adding > nodes and relations like this will cause a constant accumulation of > non-reclamable heap memory? > > UPDATE: while writing this I also tried to split the CSV up into smaller > chunks (500000 rows): when I load each chunk during the same neo4j-shell > session, the amount of memory that cannot get re-claimed goes up in the > same way and eventually I get an out of memory exception, even with 20G. > However, loading each chunk in a separate neo4j-shell session works just > fine and never uses more than 4G of memory. > This seems to indicate that there may be some memory leak somewhere, > because that memory does not actually seem to be necessary after all to > create the graph? > > johann > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
