For what it's worth, I ended up using embedded java to write directly to the graph while the server is detached. This ended up giving me the fastest performance as I found the REST interface way too slow for large data sets (> 1M records). I'm still not really happy with the performance, but I was able to achieve 20 - 25 atomic transactions per second while creating 6 nodes (with indexes) with 6 relationships. On 5 of the nodes, there was an indexed lookup step (Index object) as those nodes needed to be unique (they were location nodes - city, state, zip, etc). For 1.4M nodes total, or approximately 1.3M postgres db records, the process took around 16 hours. With the REST api, I noted approximately 30ms - 90ms for each node creation, which would have taken approximately 24 hours on the low end and approximately 36 hours on the high end to insert.
Does my performance seem consistent with reality or is there something obvious that I'm missing? I'm going to run a test of something like 50 - 100 concurrent REST transactions against the server to see if I can speed that up. I typically use the multiprocessing module in python or a rabbitmq exchange for such an operation. It's unfortunate that the new import tool included with 2.2 can only write to a new graph db store. Our use case is graph-assisted data analysis to a unified store (with logical separation of domains by a root node), so we need to take advantage of the additive nature of the graph when batch loading data. Paul On Tuesday, February 3, 2015 at 5:43:45 PM UTC-6, Michael Hunger wrote: > > Hi Jesse, > > there are some tips on the website, > http://neo4j.com/developer/guide-import-csv/ > > Do you know how to create a CSV from your relational table? > > I agree, the batch-importer makes most sense there. > > based on the table > > id1 varchar, id2 varchar rel_property int > > If you create a csv file for the nodes > > select id1 as "id:ID", "User" as ":LABEL" from table > union > select id2 as "id:ID", "User" as ":LABEL" from table > > and for the relationships a csv > > select id1 as ":START_ID", id2 as ":END_ID", rel_property as "value:INT", > "LINKS_TO" as ":TYPE" from table > > and then use the new batch-importer that comes with neo4j 2.2 > > bin/neo4j-import --nodes nodes.csv --relationships relationships.csv > --id-type string --into test.db > > > If you can't use it, I suggest something like my groovy script here: > jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/ > > Am 03.02.2015 um 09:18 schrieb Jesse Liu <[email protected] > <javascript:>>: > > Hi, All, > > I'm a beginner of graph database Neo4J. > Now I need to import the data from Oracle to Neo4j. > > First, I'll describe my application scenario. > > I have just one oracle table with more than 100 million rows. > The table desc is: > id1 varchar, id2 varchar, relation_properpy int. > > id1 and id2 are primary key. > > The oracle server and Neo4J server are set up on the same machine. > > Now how I can create nodes for each id and one directed relationship > between id1 and id2 for each row? > > As far as I know, there are three ways to do this: > 1. Java Rest JDBC API > I've write a code demo and found it's too slow: 100,00 rows per minute. > Besides, it's not easy to establish a Java Environment in > > 2. Python Embedded. > I haven't write test code right now, but I think it's not better than Java. > > 3.Batch Insert > Export the data from oracle as CSV file; > Import the CSV data into Neo4J using Cypher. > I believe it's the fastest way to import data. However, I don't know how > to do this. All the demo I've seen on the Internet is about adding nodes > but without adding relationships with specific properties. > > I wonder is there anybody encounter such scenario? Can you give me some > advises? Or is there any better solution to import data? > > Thank you very much! > > Jesse > Feb 3rd, 2015 > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] <javascript:>. > For more options, visit https://groups.google.com/d/optout. > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
