Re: [Neo4j] CSV importer chokes on large files creating relations

Eric Olson Fri, 06 Jun 2014 12:46:58 -0700

Yes, I also tried USING PERIODIC COMMIT with 10000 and 50000 values.

Yes, as stated I set indexes on the name properties in anticipation of the 
queries.


I was wrong in saying that it 'failed' because it didn't (except when 
running in the web browser and it timed out). What I meant was that it was 
taking enormous amounts of time. Much more time than the other imports if 
scaled linearly. I never did let it finish because I could no longer wait. 
With imports where there is a MATCH statement, should I expect the running 
time to be excessive in relation to imports which simply CREATEs nodes?



On Friday, June 6, 2014 12:10:54 PM UTC-6, Michael Hunger wrote:
>
> How did it fail?
>
> Did you try USING PERIODIC COMMIT 10000 ?
>
> Do you have an index for : :User(name) and :Group(name) ?
>
>
> On Fri, Jun 6, 2014 at 12:34 AM, Eric Olson <[email protected] 
> <javascript:>> wrote:
>
>> I have read some other topics on this and am still coming up short on a 
>> satisfying solution.
>>
>> I am:
>>
>>    - Populating my DB using the new CSV import query in Cypher 
>>    - Using the Neo4j shell
>>    - Including the "USING PERIODIC COMMIT" statement
>>
>> I have:
>>
>>    - Successfully imported a 10,000 line file in ~2 seconds
>>    - Successfully imported a 500,000 line file in ~20 seconds
>>    - Successfully imported a 5,000,000 line file in ~3 minutes 
>>    - FAILED to import a 100,000,000 line file!
>>
>> The first 3 imports were just to create some simple nodes. The failed 
>> import was to create relationships and the statement looks like:
>>
>>
>> USING PERIODIC COMMIT 100000
>> LOAD CSV WITH HEADERS FROM 'file:/mcpdata/5_usr-grp.csv' AS line
>> MATCH (usr:User { name: line.user }), (grp:Group { name: line.group })
>> CREATE (user)-[:IN]->(grp)
>>
>>
>> And yes, I have set indexes on the name properties of each so that they 
>> can be retrieved quickly.
>>
>> This has been spinning for well over an hour and still no completion. I 
>> am assuming based on the other timings that it should take about 30 minutes 
>> + query times to retrieve the objects I am making the relationship between. 
>> Is it still the MATCH query that is killing me here? If on average it takes 
>> 10ms for each object retrieval, then with 100M lines (200M total retrievals 
>> then), this could add up to an additional 23 days of running time :)
>>
>> IS THERE A BETTER WAY?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] CSV importer chokes on large files creating relations

Reply via email to