Re: [Neo4j] Speed up CSV import Cypher process as best as possible

Michael Hunger Thu, 28 Aug 2014 02:40:43 -0700

Hi Curtis,

if you do this:


USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:C:/test.txt" AS csvLine
CREATE (:Person { person_id: toInt(csvLine.person_id), name: csvLine.name });

It should do between 10k-30k nodes per second.
It will be slower if you have a unique constraint in place.

Please run it in the Neo4j-Shell, it is much easier to handle there.

As you are running on windows, with 4GB in total, please also make sure that 
your mmio config in neo4j.properties is not too large.

Perhaps 100MB for nodes and 500MB for rels and another 250MB for properties.

On windows the memory mapping memory is taken from the heap.

You should see some info about that in in your graph.db/messages.log files 
after the startup, feel free to share that file with us then we can help you to 
set up your config.

Michael

Am 28.08.2014 um 11:11 schrieb 'Curtis Mosters' via Neo4j 
<[email protected]>:

> Yesterday evening I was running it 52 minutes. And then I got an "Unknown 
> Error".
> 
> So now I tested it now with the "USING PERIODIC COMMIT 10000" and now its 
> about 60 minutes. Then the same error.
> 
> I now looked into the graph.db folder and its overall 1,75 GB big. The 
> propertystore file ist 370 mb e.g.
> 
> So what else can I do do get it running in the browser? Or could I run this 
> task in the Neo4jShell?
> 
> Am Donnerstag, 28. August 2014 00:16:07 UTC+2 schrieb Chris Vest:
> All transaction state is currently kept in memory on the java heap, and 20+ 
> mio. changes is too much to fit in a 4 GB heap.
> When you have too much stuff on the heap, it will manifest with those "GC 
> overhead limit exceeded" and the database will run slow, though there are 
> other things that can produce similar symptoms.
> 
> Try putting USING PERIODIC COMMIT 10000 in front of your LOAD CSV query. This 
> will periodically commit the transaction, thus limiting the transaction state 
> kept in memory. Unfortunately it will also break the atomicity of the 
> transaction.
> 
> --
> Chris Vest
> System Engineer, Neo Technology
> [ skype: mr.chrisvest, twitter: chvest ]
> 
> 
> On 27 Aug 2014, at 22:31, 'Curtis Mosters' via Neo4j <[email protected]> 
> wrote:
> 
>> Let's say I have:
>> 
>> LOAD CSV WITH HEADERS FROM "file:C:/test.txt" AS csvLine
>> CREATE (p:Person { person_id: toInt(csvLine.person_id), name: csvLine.name })
>> 
>> I run this query in the browser. I know that it's not the fastest way and I 
>> should think about using the batch importer. But I really like that way 
>> somehow and want to speed it up.
>> 
>> So when I ran this the first time, after like 2 or 3 minutes I got an erro 
>> saying "GC overhead limit exceeded". So It set
>> 
>> wrapper.java.initmemory=4096
>> wrapper.java.maxmemory=4096
>> 
>> Now the error does not come up. But it's still slow and I can't see how much 
>> time is still needed. So if you have tips on doing this, I would be very 
>> thankful. =)
>> 
>> PS: the file is 2 gb big and has like 20 mio entries
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Speed up CSV import Cypher process as best as possible

Reply via email to