Good luck, ping me when you worked it out :) Cheers, Michael
> Am 09.02.2015 um 23:43 schrieb Paul Shoemaker <[email protected]>: > > Hi Michael, > > Just wanted to give a quick update that I have improved heavily on my code > that I submitted and have achieved much higher throughput levels on a single > thread. Thank you so much, again, for your suggestion to batch 10k > transactions at once. Now on to a multi-threaded approach :-) > > Paul > > On Friday, February 6, 2015 at 7:20:22 AM UTC-8, Paul Shoemaker wrote: > Hi Michael, > > Thank you so much for taking valuable time to assist. I have attached the > relevant code and messages.log. > > I will admit that this is not particularly elegant, but it is more POC than > anything. You will see that I am using a transaction per loop trip while > iterating through my db resultset. If that is a particularly expensive > operation (I have a suspicion that it might be), I could create pressure by > batching 10k at a time. Would I benefit from using the BatchInserter > process? I am going to try this today and see if I yield better results. > > I would love to use the new batch importer, but unfortunately, as I > understand, it requires a clean graph and cannot append to a graph that > already exists. For our needs, we will be performing large data imports to > the same graph and they will come at asynchronous times. Perhaps I could use > the new concurrency in 2.2. Can you please point me to how you can insert > concurrently into the same graph file? My tests have all given me issues > because once the file has been opened, it is locked and cannot be accessed by > another process. Or, perhaps, I should open the file and thread out the > transactional operations? I will try this today, as well. > > Thanks again! > > Paul > > > From: Michael Hunger > Reply-To: <[email protected] <mailto:[email protected]>> > Date: Friday, February 6, 2015 at 2:58 AM > To: "[email protected] <mailto:[email protected]>" > Subject: Re: [Neo4j] Import Data From Oracle to Neo4J > > It should be much much faster. > > 1. use larger transactions (10k elements) to _batch_ your inserts > 2. 2.2 supports much better concurrent/smaller transactions scaling, e.g. I > created 10M nodes in 40s with concurrent small transactions (2 nodes 1 rel). > > if you can share your code, we can have a look. Index lookups hurt something, > true. > also share your config (heap, mmio settings etc) best would be > graph.db/messages.log > > Cheers, Michael > >> Am 05.02.2015 um 21:50 schrieb Paul Shoemaker <[email protected] >> <mailto:[email protected]>>: >> >> For what it's worth, I ended up using embedded java to write directly to the >> graph while the server is detached. This ended up giving me the fastest >> performance as I found the REST interface way too slow for large data sets >> (> 1M records). I'm still not really happy with the performance, but I was >> able to achieve 20 - 25 atomic transactions per second while creating 6 >> nodes (with indexes) with 6 relationships. On 5 of the nodes, there was an >> indexed lookup step (Index object) as those nodes needed to be unique (they >> were location nodes - city, state, zip, etc). For 1.4M nodes total, or >> approximately 1.3M postgres db records, the process took around 16 hours. >> With the REST api, I noted approximately 30ms - 90ms for each node creation, >> which would have taken approximately 24 hours on the low end and >> approximately 36 hours on the high end to insert. >> >> Does my performance seem consistent with reality or is there something >> obvious that I'm missing? >> >> I'm going to run a test of something like 50 - 100 concurrent REST >> transactions against the server to see if I can speed that up. I typically >> use the multiprocessing module in python or a rabbitmq exchange for such an >> operation. >> >> It's unfortunate that the new import tool included with 2.2 can only write >> to a new graph db store. Our use case is graph-assisted data analysis to a >> unified store (with logical separation of domains by a root node), so we >> need to take advantage of the additive nature of the graph when batch >> loading data. >> >> Paul >> >> On Tuesday, February 3, 2015 at 5:43:45 PM UTC-6, Michael Hunger wrote: >> Hi Jesse, >> >> there are some tips on the website, >> http://neo4j.com/developer/guide-import-csv/ >> <http://neo4j.com/developer/guide-import-csv/> >> >> Do you know how to create a CSV from your relational table? >> >> I agree, the batch-importer makes most sense there. >> >> based on the table >> >> id1 varchar, id2 varchar rel_property int >> >> If you create a csv file for the nodes >> >> select id1 as "id:ID", "User" as ":LABEL" from table >> union >> select id2 as "id:ID", "User" as ":LABEL" from table >> >> and for the relationships a csv >> >> select id1 as ":START_ID", id2 as ":END_ID", rel_property as "value:INT", >> "LINKS_TO" as ":TYPE" from table >> >> and then use the new batch-importer that comes with neo4j 2.2 >> >> bin/neo4j-import --nodes nodes.csv --relationships relationships.csv >> --id-type string --into test.db >> >> >> If you can't use it, I suggest something like my groovy script here: >> jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/ >> <http://jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/> >> >>> Am 03.02.2015 um 09:18 schrieb Jesse Liu <liu.we...@ <>gmail.com >>> <http://gmail.com/>>: >>> >>> Hi, All, >>> >>> I'm a beginner of graph database Neo4J. >>> Now I need to import the data from Oracle to Neo4j. >>> >>> First, I'll describe my application scenario. >>> >>> I have just one oracle table with more than 100 million rows. >>> The table desc is: >>> id1 varchar, id2 varchar, relation_properpy int. >>> >>> id1 and id2 are primary key. >>> >>> The oracle server and Neo4J server are set up on the same machine. >>> >>> Now how I can create nodes for each id and one directed relationship >>> between id1 and id2 for each row? >>> >>> As far as I know, there are three ways to do this: >>> 1. Java Rest JDBC API >>> I've write a code demo and found it's too slow: 100,00 rows per minute. >>> Besides, it's not easy to establish a Java Environment in >>> >>> 2. Python Embedded. >>> I haven't write test code right now, but I think it's not better than Java. >>> >>> 3.Batch Insert >>> Export the data from oracle as CSV file; >>> Import the CSV data into Neo4J using Cypher. >>> I believe it's the fastest way to import data. However, I don't know how to >>> do this. All the demo I've seen on the Internet is about adding nodes but >>> without adding relationships with specific properties. >>> >>> I wonder is there anybody encounter such scenario? Can you give me some >>> advises? Or is there any better solution to import data? >>> >>> Thank you very much! >>> >>> Jesse >>> Feb 3rd, 2015 >>> >>> -- >>> You received this message because you are subscribed to the Google Groups >>> "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send an >>> email to neo4j+un...@ <>googlegroups.com <http://googlegroups.com/>. >>> For more options, visit https://groups.google.com/d/optout >>> <https://groups.google.com/d/optout>. >> >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Neo4j" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected] >> <mailto:[email protected]>. >> For more options, visit https://groups.google.com/d/optout >> <https://groups.google.com/d/optout>. > > > -- > You received this message because you are subscribed to a topic in the Google > Groups "Neo4j" group. > To unsubscribe from this topic, visit > https://groups.google.com/d/topic/neo4j/Py7hrc5Jf8U/unsubscribe > <https://groups.google.com/d/topic/neo4j/Py7hrc5Jf8U/unsubscribe>. > To unsubscribe from this group and all its topics, send an email to > [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. > > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected] > <mailto:[email protected]>. > For more options, visit https://groups.google.com/d/optout > <https://groups.google.com/d/optout>. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
