Re: [Neo4j] batch importer is slow with big IDs

Michael Hunger Wed, 29 Oct 2014 19:12:57 -0700

Hi Uwe,

the problem is that those ID's are internal to Neo4j where they represent
disk record-ids. if you provide high values there, Neo4j will create _a
lot_ of empty records until it reaches your ids.


So either you create your node-id's starting from 0 and you store your id
as normal node property.
Or you don't provide node-id's at all and only lookup nodes via their
"business-id-value"

i:id    id:long    l:label
0    315041100    Person
1    201215100    Person
2    315041200    Person

start:id    end:id    type    relart
0    1    HAS_RELATION    30006
2    0    HAS_RELATION    30006

or you have to configure and use an index:

i:id    id:long:people    l:label
0    315041100    Person
1    201215100    Person
2    315041200    Person

id:long:people    id:long:people    type    relart
0    1    HAS_RELATION    30006
2    0    HAS_RELATION    30006


HTH Michael

Alternatively you can also just write a small java or groovy program to
import your data if handling those ids with the batch-importer is too
tricky.
See: http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/

On Tue, Oct 28, 2014 at 8:39 AM, Uwe Ludwig <[email protected]>
wrote:

> Hi all,
>
> i want to import csv-Files with about 40 million lines into neo4j. For
> this i try to use the "batchimporter" from
> https://github.com/jexp/batch-import.
> Maybe it's a problem that i provide own IDs. This is the example
>
>
> nodes.csv:
> i:id    l:label
> 315041100    Person
> 201215100    Person
> 315041200    Person
>
> rels.csv :
> start    end    type    relart
> 315041100    201215100    HAS_RELATION    30006
> 315041200    315041100    HAS_RELATION    30006
>
> the content of batch.properties:
> use_memory_mapped_buffers=true
> neostore.nodestore.db.mapped_memory=1000M
> neostore.relationshipstore.db.mapped_memory=5000M
> neostore.propertystore.db.mapped_memory=4G
> neostore.propertystore.db.strings.mapped_memory=2000M
> neostore.propertystore.db.arrays.mapped_memory=1000M
> neostore.propertystore.db.index.keys.mapped_memory=1500M
> neostore.propertystore.db.index.mapped_memory=1500M
> batch_import.node_index.node_auto_index=exact
>
>
> "./import.sh graph.db nodes.csv rels.csv"
>
> will be processed without errors, but it takes about 60 seconds! When i
> use smaller IDs - for example 3150411 instead of 315041100 - it takes just
> 1 second!
> Actually i would take even bigger IDs with 10 digits. I don't know what
> i'm doing wrong. Can anyone see an error? Do i have to assign an explicit
> type (long?) for the IDs?
> How can i do this?
>
>
> - JDK 1.7
> - batchimporter 2.1.3 (with neo4j 2.1.3)
> - OS: ubuntu 14.04
> - Hardware: 8-Core-Intel-CPU, 16GB RAM
>
> Best regards and thanks in advance
>
> Uwe
>
>
>
>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] batch importer is slow with big IDs

Reply via email to