Re: [Neo4j] Re: LOAD CSV on bulk, performance

Rik Van Bruggen Wed, 13 Aug 2014 02:09:30 -0700

Batch import is completely different from load csv:

   - load csv is a transactional import on a running server
   - batch-import is a non-transactional, all-or-nothing import into the
   neo4j store files. The server is not running at that time. You can then use
   the store files to run the server - after the import.


Hope that makes sense.

Rik


On Tue, Aug 12, 2014 at 6:44 PM, gg4u <[email protected]> wrote:

> Hi Rik!
>
> ...in minutes?
>
> I'd like to understand how I could get closer to that result, though I
> will try also that library.
>
> that's kind of strange for me, cause both using the LOAD CSV functionality
> from shell, both doing a transaction each time, it looks like I run into a
> memory heap problem.
>
> Why the batch import from shell should be so slower than the batch-import
> script?
>
> Also, I see the importer is flexible enough, but my custom file (adjacnecy
> list to avoid redundancy) is more than 1GB; if I expand it and make a csv
> full of redundancy of node-rel-neighbor1, node-rel-neighbor2, it will be
> much much bigger and i am worried if it could be handled.
>
> A question:
> in rel.csv  (https://github.com/jexp/batch-import/tree/20)
> i read node-id start from 0.
>
> Are they temporary id or mandatory?
> E.g. what if I would like to upload another subgraph in the same db with
> the batch importer (clearly without overriding the nodes) ?
>
>
>
>
>
>
> Il giorno martedì 12 agosto 2014 18:46:00 UTC+2, Rik Van Bruggen ha
> scritto:
>>
>> I think you should use the batch importer for this size of a graph. You
>> will be done in minutes, not hours.
>>
>> https://github.com/jexp/batch-import/tree/20
>>
>> Rik
>>
>> On Tuesday, August 12, 2014 5:13:39 PM UTC+1, gg4u wrote:
>>>
>>> Hello,
>>>
>>> here i am trying to upload a massive network:
>>> 4M nodes, 100M correlations.
>>>
>>> having problems of memory and perfomance, I'd like to know if I am doing
>>> it OK:
>>>
>>> 1.
>>> Before loading the correlations, I wanted to load the nodes.
>>>
>>> 2. Set up neo4-wrapper and neo4j.properties as written in
>>> http://www.neo4j.org/graphgist?d788e117129c3730a042
>>>
>>> with JVM heap set at 4096Mb
>>>
>>> with this setting, bulk on 4M nodes failed.
>>>
>>> 3. Raised memory min-heap and max-heap to 6144Mb
>>> Run a test with 100K nodes.
>>>
>>> I got:
>>> Nodes created: 98991
>>> Properties set: 197982
>>> Labels added: 98991
>>> 3438685 ms
>>>
>>> Almost an hour for uploading 100K nodes with two properties?
>>> I thought it should be much faster.
>>>
>>> Am I doing smtg wrong?
>>> this is the importer code I used:
>>>
>>> CREATE CONSTRAINT ON (n:MYNODES) ASSERT n.id IS UNIQUE;
>>> CREATE INDEX ON : n:MYNODES(name);
>>>
>>> USING PERIODIC COMMIT 1000
>>> LOAD CSV WITH HEADERS FROM 'file:///blablabla.csv' AS line
>>>  FIELDTERMINATOR '\t'
>>> WITH line, toInt(line.topicId) as id, line.name as name* LIMIT 100000*
>>> MERGE (n:MYNODES { id: id, name: name });
>>>
>>>
>>>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/EVdq1qUaFQY/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Rik Van Bruggen
[email protected]
mob: +32 478 686800
Phone: +44 20 3286 2230
skype: rvanbruggen
*Join us at GraphConnect 2014 San Francisco! graphconnect.com
<http://graphconnect.com/>*
*As a friend of Neo4j, use discount code *KOMPIS
<https://graphconnect2014sf.eventbrite.com/?discount=KOMPIS>* for $100 off
registration*

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV on bulk, performance

Reply via email to