Re: [Neo4j] Import Data From Oracle to Neo4J

Paul Shoemaker Mon, 09 Feb 2015 14:44:08 -0800

Hi Michael,

Just wanted to give a quick update that I have improved heavily on my code 
that I submitted and have achieved much higher throughput levels on a 
single thread.  Thank you so much, again, for your suggestion to batch 10k 
transactions at once.  Now on to a multi-threaded approach :-)


Paul

On Friday, February 6, 2015 at 7:20:22 AM UTC-8, Paul Shoemaker wrote:
>
> Hi Michael,
>
> Thank you so much for taking valuable time to assist.  I have attached the 
> relevant code and messages.log.
>
> I will admit that this is not particularly elegant, but it is more POC 
> than anything.  You will see that I am using a transaction per loop trip 
> while iterating through my db resultset.  If that is a particularly 
> expensive operation (I have a suspicion that it might be), I could create 
> pressure by batching 10k at a time.  Would I benefit from using the 
> BatchInserter process?  I am going to try this today and see if I yield 
> better results.
>
> I would love to use the new batch importer, but unfortunately, as I 
> understand, it requires a clean graph and cannot append to a graph that 
> already exists.  For our needs, we will be performing large data imports to 
> the same graph and they will come at asynchronous times.  Perhaps I could 
> use the new concurrency in 2.2.  Can you please point me to how you can 
> insert concurrently into the same graph file?  My tests have all given me 
> issues because once the file has been opened, it is locked and cannot be 
> accessed by another process.  Or, perhaps, I should open the file and 
> thread out the transactional operations?  I will try this today, as well.
>
> Thanks again!
>
> Paul
>
>
> From: Michael Hunger
> Reply-To: <[email protected]>
> Date: Friday, February 6, 2015 at 2:58 AM
> To: "[email protected]"
> Subject: Re: [Neo4j] Import Data From Oracle to Neo4J
>
> It should be much much faster.
>
> 1. use larger transactions (10k elements) to _batch_ your inserts
> 2. 2.2 supports *much* better concurrent/smaller transactions scaling, 
> e.g. I created 10M nodes in 40s with concurrent small transactions (2 nodes 
> 1 rel).
>
> if you can share your code, we can have a look. Index lookups hurt 
> something, true.
> also share your config (heap, mmio settings etc) best would be 
> graph.db/messages.log
>
> Cheers, Michael
>
> Am 05.02.2015 um 21:50 schrieb Paul Shoemaker <[email protected]>:
>
> For what it's worth, I ended up using embedded java to write directly to 
> the graph while the server is detached.  This ended up giving me the 
> fastest performance as I found the REST interface way too slow for large 
> data sets (> 1M records).  I'm still not really happy with the performance, 
> but I was able to achieve 20 - 25 atomic transactions per second while 
> creating 6 nodes (with indexes) with 6 relationships.  On 5 of the nodes, 
> there was an indexed lookup step (Index object) as those nodes needed to be 
> unique (they were location nodes - city, state, zip, etc).  For 1.4M nodes 
> total, or approximately 1.3M postgres db records, the process took around 
> 16 hours.  With the REST api, I noted approximately 30ms - 90ms for each 
> node creation, which would have taken approximately 24 hours on the low end 
> and approximately 36 hours on the high end to insert.  
>
> Does my performance seem consistent with reality or is there something 
> obvious that I'm missing?
>
> I'm going to run a test of something like 50 - 100 concurrent REST 
> transactions against the server to see if I can speed that up.  I typically 
> use the multiprocessing module in python or a rabbitmq exchange for such an 
> operation.
>
> It's unfortunate that the new import tool included with 2.2 can only write 
> to a new graph db store.  Our use case is graph-assisted data analysis to a 
> unified store (with logical separation of domains by a root node), so we 
> need to take advantage of the additive nature of the graph when batch 
> loading data.
>
> Paul
>
> On Tuesday, February 3, 2015 at 5:43:45 PM UTC-6, Michael Hunger wrote:
>>
>> Hi Jesse,
>>
>> there are some tips on the website, 
>> http://neo4j.com/developer/guide-import-csv/
>>
>> Do you know how to create a CSV from your relational table?
>>
>> I agree, the batch-importer makes most sense there.
>>
>> based on the table
>>
>> id1 varchar, id2 varchar rel_property int
>>
>> If you create a csv file for the nodes
>>
>> select id1 as "id:ID", "User" as ":LABEL" from table
>> union
>> select id2 as "id:ID", "User" as ":LABEL" from table
>>
>> and for the relationships a csv
>>
>> select id1 as ":START_ID", id2 as ":END_ID", rel_property as "value:INT", 
>> "LINKS_TO" as ":TYPE" from table
>>
>> and then use the new batch-importer that comes with neo4j 2.2
>>
>> bin/neo4j-import --nodes nodes.csv --relationships relationships.csv 
>> --id-type string --into test.db
>>
>>
>> If you can't use it, I suggest something like my groovy script here:
>> jexp.de/blog/2014/10/flexible-neo4j-batch-import-with-groovy/
>>
>> Am 03.02.2015 um 09:18 schrieb Jesse Liu <[email protected]>:
>>
>> Hi, All,
>>
>> I'm a beginner of graph database Neo4J.
>> Now I need to import the data from Oracle to Neo4j.
>>
>> First, I'll describe my application scenario.
>>
>> I have just one oracle table with more than 100 million rows.
>> The table desc is:
>> id1 varchar, id2 varchar, relation_properpy int.
>>
>> id1 and id2 are primary key.
>>
>> The oracle server and Neo4J server are set up on the same machine.
>>
>> Now how I can create nodes for each id and one directed relationship 
>> between id1 and id2 for each row?
>>
>> As far as I know, there are three ways to do this:
>> 1. Java Rest JDBC API
>> I've write a code demo and found it's too slow: 100,00 rows per minute.
>> Besides, it's not easy to establish a Java Environment in 
>>
>> 2. Python Embedded.
>> I haven't write test code right now, but I think it's not better than 
>> Java.
>>
>> 3.Batch Insert
>> Export the data from oracle as CSV file;
>> Import the CSV data into Neo4J using Cypher.
>> I believe it's the fastest way to import data. However, I don't know how 
>> to do this. All the demo I've seen on the Internet is about adding nodes 
>> but without adding relationships with specific properties.
>>
>> I wonder is there anybody encounter such scenario? Can you give me some 
>> advises? Or is there any better solution to import data?
>>
>> Thank you very much!
>>
>> Jesse
>> Feb 3rd, 2015
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
> -- 
> You received this message because you are subscribed to a topic in the 
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/neo4j/Py7hrc5Jf8U/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Import Data From Oracle to Neo4J

Reply via email to