[Neo4j] Re: LOAD CSV on bulk, performance

gg4u Tue, 12 Aug 2014 10:44:28 -0700

Hi Rik!

...in minutes?

I'd like to understand how I could get closer to that result, though I will 
try also that library.

that's kind of strange for me, cause both using the LOAD CSV functionality 
from shell, both doing a transaction each time, it looks like I run into a 
memory heap problem.

Why the batch import from shell should be so slower than the batch-import 
script?

Also, I see the importer is flexible enough, but my custom file (adjacnecy 
list to avoid redundancy) is more than 1GB; if I expand it and make a csv 
full of redundancy of node-rel-neighbor1, node-rel-neighbor2, it will be 
much much bigger and i am worried if it could be handled.

A question:
in rel.csv  (https://github.com/jexp/batch-import/tree/20)
i read node-id start from 0.

Are they temporary id or mandatory?
E.g. what if I would like to upload another subgraph in the same db with 
the batch importer (clearly without overriding the nodes) ?

Il giorno martedì 12 agosto 2014 18:46:00 UTC+2, Rik Van Bruggen ha scritto:
>
> I think you should use the batch importer for this size of a graph. You 
> will be done in minutes, not hours.
>
> https://github.com/jexp/batch-import/tree/20
>
> Rik
>
> On Tuesday, August 12, 2014 5:13:39 PM UTC+1, gg4u wrote:
>>
>> Hello,
>>
>> here i am trying to upload a massive network:
>> 4M nodes, 100M correlations.
>>
>> having problems of memory and perfomance, I'd like to know if I am doing 
>> it OK:
>>
>> 1.
>> Before loading the correlations, I wanted to load the nodes.
>>
>> 2. Set up neo4-wrapper and neo4j.properties as written in 
>> http://www.neo4j.org/graphgist?d788e117129c3730a042
>>
>> with JVM heap set at 4096Mb
>>
>> with this setting, bulk on 4M nodes failed.
>>
>> 3. Raised memory min-heap and max-heap to 6144Mb
>> Run a test with 100K nodes.
>>
>> I got:
>> Nodes created: 98991
>> Properties set: 197982
>> Labels added: 98991
>> 3438685 ms
>>
>> Almost an hour for uploading 100K nodes with two properties?
>> I thought it should be much faster.
>>
>> Am I doing smtg wrong?
>> this is the importer code I used:
>>
>> CREATE CONSTRAINT ON (n:MYNODES) ASSERT n.id IS UNIQUE;
>> CREATE INDEX ON : n:MYNODES(name);
>>
>> USING PERIODIC COMMIT 1000
>> LOAD CSV WITH HEADERS FROM 'file:///blablabla.csv' AS line 
>>  FIELDTERMINATOR '\t' 
>> WITH line, toInt(line.topicId) as id, line.name as name* LIMIT 100000*
>> MERGE (n:MYNODES { id: id, name: name });
>>
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

[Neo4j] Re: LOAD CSV on bulk, performance

Reply via email to