It should be much much faster. 1. use larger transactions (10k elements) to _batch_ your inserts 2. 2.2 supports much better concurrent/smaller transactions scaling, e.g. I created 10M nodes in 40s with concurrent small transactions (2 nodes 1 rel).
if you can share your code, we can have a look. Index lookups hurt something, true. also share your config (heap, mmio settings etc) best would be graph.db/messages.log Cheers, Michael > Am 05.02.2015 um 21:50 schrieb Paul Shoemaker <[email protected]>: > > For what it's worth, I ended up using embedded java to write directly to the > graph while the server is detached. This ended up giving me the fastest > performance as I found the REST interface way too slow for large data sets (> > 1M records). I'm still not really happy with the performance, but I was able > to achieve 20 - 25 atomic transactions per second while creating 6 nodes > (with indexes) with 6 relationships. On 5 of the nodes, there was an indexed > lookup step (Index object) as those nodes needed to be unique (they were > location nodes - city, state, zip, etc). For 1.4M nodes total, or > approximately 1.3M postgres db records, the process took around 16 hours. > With the REST api, I noted approximately 30ms - 90ms for each node creation, > which would have taken approximately 24 hours on the low end and > approximately 36 hours on the high end to insert. > > Does my performance seem consistent with reality or is there something > obvious that I'm missing? > > I'm going to run a test of something like 50 - 100 concurrent REST > transactions against the server to see if I can speed that up. I typically > use the multiprocessing module in python or a rabbitmq exchange for such an > operation. > > It's unfortunate that the new import tool included with 2.2 can only write to > a new graph db store. Our use case is graph-assisted data analysis to a > unified store (with logical separation of domains by a root node), so we need > to take advantage of the additive nature of the graph when batch loading data. > > Paul > > On Tuesday, February 3, 2015 at 5:43:45 PM UTC-6, Michael Hunger wrote: > Hi Jesse, > > there are some tips on the website, > http://neo4j.com/developer/guide-import-csv/ > <http://neo4j.com/developer/guide-import-csv/> > > Do you know how to create a CSV from your relational table? > > I agree, the batch-importer makes most sense there. > > based on the table > > id1 varchar, id2 varchar rel_property int > > If you create a csv file for the nodes > > select id1 as "id:ID", "User" as ":LABEL" from table > union > select id2 as "id:ID", "User" as ":LABEL" from table > > and for the relationships a csv > > select id1 as ":START_ID", id2 as ":END_ID", rel_property as "value:INT", > "LINKS_TO" as ":TYPE" from table > > and then use the new batch-importer that comes with neo4j 2.2 > > bin/neo4j-import --nodes nodes.csv --relationships relationships.csv > --id-type string --into test.db > > > If you can't use it, I suggest something like my groovy script here: > jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/ > <http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/> > >> Am 03.02.2015 um 09:18 schrieb Jesse Liu <liu.we...@ <>gmail.com >> <http://gmail.com/>>: >> >> Hi, All, >> >> I'm a beginner of graph database Neo4J. >> Now I need to import the data from Oracle to Neo4j. >> >> First, I'll describe my application scenario. >> >> I have just one oracle table with more than 100 million rows. >> The table desc is: >> id1 varchar, id2 varchar, relation_properpy int. >> >> id1 and id2 are primary key. >> >> The oracle server and Neo4J server are set up on the same machine. >> >> Now how I can create nodes for each id and one directed relationship between >> id1 and id2 for each row? >> >> As far as I know, there are three ways to do this: >> 1. Java Rest JDBC API >> I've write a code demo and found it's too slow: 100,00 rows per minute. >> Besides, it's not easy to establish a Java Environment in >> >> 2. Python Embedded. >> I haven't write test code right now, but I think it's not better than Java. >> >> 3.Batch Insert >> Export the data from oracle as CSV file; >> Import the CSV data into Neo4J using Cypher. >> I believe it's the fastest way to import data. However, I don't know how to >> do this. All the demo I've seen on the Internet is about adding nodes but >> without adding relationships with specific properties. >> >> I wonder is there anybody encounter such scenario? Can you give me some >> advises? Or is there any better solution to import data? >> >> Thank you very much! >> >> Jesse >> Feb 3rd, 2015 >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
