Batch import is completely different from load csv: - load csv is a transactional import on a running server - batch-import is a non-transactional, all-or-nothing import into the neo4j store files. The server is not running at that time. You can then use the store files to run the server - after the import.
Hope that makes sense. Rik On Tue, Aug 12, 2014 at 6:44 PM, gg4u <[email protected]> wrote: > Hi Rik! > > ...in minutes? > > I'd like to understand how I could get closer to that result, though I > will try also that library. > > that's kind of strange for me, cause both using the LOAD CSV functionality > from shell, both doing a transaction each time, it looks like I run into a > memory heap problem. > > Why the batch import from shell should be so slower than the batch-import > script? > > Also, I see the importer is flexible enough, but my custom file (adjacnecy > list to avoid redundancy) is more than 1GB; if I expand it and make a csv > full of redundancy of node-rel-neighbor1, node-rel-neighbor2, it will be > much much bigger and i am worried if it could be handled. > > A question: > in rel.csv (https://github.com/jexp/batch-import/tree/20) > i read node-id start from 0. > > Are they temporary id or mandatory? > E.g. what if I would like to upload another subgraph in the same db with > the batch importer (clearly without overriding the nodes) ? > > > > > > > Il giorno martedì 12 agosto 2014 18:46:00 UTC+2, Rik Van Bruggen ha > scritto: >> >> I think you should use the batch importer for this size of a graph. You >> will be done in minutes, not hours. >> >> https://github.com/jexp/batch-import/tree/20 >> >> Rik >> >> On Tuesday, August 12, 2014 5:13:39 PM UTC+1, gg4u wrote: >>> >>> Hello, >>> >>> here i am trying to upload a massive network: >>> 4M nodes, 100M correlations. >>> >>> having problems of memory and perfomance, I'd like to know if I am doing >>> it OK: >>> >>> 1. >>> Before loading the correlations, I wanted to load the nodes. >>> >>> 2. Set up neo4-wrapper and neo4j.properties as written in >>> http://www.neo4j.org/graphgist?d788e117129c3730a042 >>> >>> with JVM heap set at 4096Mb >>> >>> with this setting, bulk on 4M nodes failed. >>> >>> 3. Raised memory min-heap and max-heap to 6144Mb >>> Run a test with 100K nodes. >>> >>> I got: >>> Nodes created: 98991 >>> Properties set: 197982 >>> Labels added: 98991 >>> 3438685 ms >>> >>> Almost an hour for uploading 100K nodes with two properties? >>> I thought it should be much faster. >>> >>> Am I doing smtg wrong? >>> this is the importer code I used: >>> >>> CREATE CONSTRAINT ON (n:MYNODES) ASSERT n.id IS UNIQUE; >>> CREATE INDEX ON : n:MYNODES(name); >>> >>> USING PERIODIC COMMIT 1000 >>> LOAD CSV WITH HEADERS FROM 'file:///blablabla.csv' AS line >>> FIELDTERMINATOR '\t' >>> WITH line, toInt(line.topicId) as id, line.name as name* LIMIT 100000* >>> MERGE (n:MYNODES { id: id, name: name }); >>> >>> >>> -- > You received this message because you are subscribed to a topic in the > Google Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/EVdq1qUaFQY/unsubscribe. > To unsubscribe from this group and all its topics, send an email to > [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- Rik Van Bruggen [email protected] mob: +32 478 686800 Phone: +44 20 3286 2230 skype: rvanbruggen *Join us at GraphConnect 2014 San Francisco! graphconnect.com <http://graphconnect.com/>* *As a friend of Neo4j, use discount code *KOMPIS <https://graphconnect2014sf.eventbrite.com/?discount=KOMPIS>* for $100 off registration* -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
