Re: [Neo4j] Import Data From Oracle to Neo4J

Michael Hunger Fri, 06 Feb 2015 00:59:13 -0800

It should be much much faster.

1. use larger transactions (10k elements) to _batch_ your inserts
2. 2.2 supports much better concurrent/smaller transactions scaling, e.g. I 
created 10M nodes in 40s with concurrent small transactions (2 nodes 1 rel).


if you can share your code, we can have a look. Index lookups hurt something, 
true.
also share your config (heap, mmio settings etc) best would be 
graph.db/messages.log

Cheers, Michael

> Am 05.02.2015 um 21:50 schrieb Paul Shoemaker <[email protected]>:
> 
> For what it's worth, I ended up using embedded java to write directly to the 
> graph while the server is detached.  This ended up giving me the fastest 
> performance as I found the REST interface way too slow for large data sets (> 
> 1M records).  I'm still not really happy with the performance, but I was able 
> to achieve 20 - 25 atomic transactions per second while creating 6 nodes 
> (with indexes) with 6 relationships.  On 5 of the nodes, there was an indexed 
> lookup step (Index object) as those nodes needed to be unique (they were 
> location nodes - city, state, zip, etc).  For 1.4M nodes total, or 
> approximately 1.3M postgres db records, the process took around 16 hours.  
> With the REST api, I noted approximately 30ms - 90ms for each node creation, 
> which would have taken approximately 24 hours on the low end and 
> approximately 36 hours on the high end to insert.  
> 
> Does my performance seem consistent with reality or is there something 
> obvious that I'm missing?
> 
> I'm going to run a test of something like 50 - 100 concurrent REST 
> transactions against the server to see if I can speed that up.  I typically 
> use the multiprocessing module in python or a rabbitmq exchange for such an 
> operation.
> 
> It's unfortunate that the new import tool included with 2.2 can only write to 
> a new graph db store.  Our use case is graph-assisted data analysis to a 
> unified store (with logical separation of domains by a root node), so we need 
> to take advantage of the additive nature of the graph when batch loading data.
> 
> Paul
> 
> On Tuesday, February 3, 2015 at 5:43:45 PM UTC-6, Michael Hunger wrote:
> Hi Jesse,
> 
> there are some tips on the website, 
> http://neo4j.com/developer/guide-import-csv/ 
> <http://neo4j.com/developer/guide-import-csv/>
> 
> Do you know how to create a CSV from your relational table?
> 
> I agree, the batch-importer makes most sense there.
> 
> based on the table
> 
> id1 varchar, id2 varchar rel_property int
> 
> If you create a csv file for the nodes
> 
> select id1 as "id:ID", "User" as ":LABEL" from table
> union
> select id2 as "id:ID", "User" as ":LABEL" from table
> 
> and for the relationships a csv
> 
> select id1 as ":START_ID", id2 as ":END_ID", rel_property as "value:INT", 
> "LINKS_TO" as ":TYPE" from table
> 
> and then use the new batch-importer that comes with neo4j 2.2
> 
> bin/neo4j-import --nodes nodes.csv --relationships relationships.csv 
> --id-type string --into test.db
> 
> 
> If you can't use it, I suggest something like my groovy script here:
> jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/ 
> <http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/>
> 
>> Am 03.02.2015 um 09:18 schrieb Jesse Liu <liu.we...@ <>gmail.com 
>> <http://gmail.com/>>:
>> 
>> Hi, All,
>> 
>> I'm a beginner of graph database Neo4J.
>> Now I need to import the data from Oracle to Neo4j.
>> 
>> First, I'll describe my application scenario.
>> 
>> I have just one oracle table with more than 100 million rows.
>> The table desc is:
>> id1 varchar, id2 varchar, relation_properpy int.
>> 
>> id1 and id2 are primary key.
>> 
>> The oracle server and Neo4J server are set up on the same machine.
>> 
>> Now how I can create nodes for each id and one directed relationship between 
>> id1 and id2 for each row?
>> 
>> As far as I know, there are three ways to do this:
>> 1. Java Rest JDBC API
>> I've write a code demo and found it's too slow: 100,00 rows per minute.
>> Besides, it's not easy to establish a Java Environment in 
>> 
>> 2. Python Embedded.
>> I haven't write test code right now, but I think it's not better than Java.
>> 
>> 3.Batch Insert
>> Export the data from oracle as CSV file;
>> Import the CSV data into Neo4J using Cypher.
>> I believe it's the fastest way to import data. However, I don't know how to 
>> do this. All the demo I've seen on the Internet is about adding nodes but 
>> without adding relationships with specific properties.
>> 
>> I wonder is there anybody encounter such scenario? Can you give me some 
>> advises? Or is there any better solution to import data?
>> 
>> Thank you very much!
>> 
>> Jesse
>> Feb 3rd, 2015
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>.
>> For more options, visit https://groups.google.com/d/optout 
>> <https://groups.google.com/d/optout>.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] 
> <mailto:[email protected]>.
> For more options, visit https://groups.google.com/d/optout 
> <https://groups.google.com/d/optout>.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Import Data From Oracle to Neo4J

Reply via email to