Re: [Neo4j] CSV importer chokes on large files creating relations

Michael Hunger Fri, 06 Jun 2014 14:26:38 -0700

Can you show the "profile" output of the neo4j-shell of your import of a
tiny variant?


e.g. your 10k file?

I could imagine it only uses one index by default and you have to force
cypher to use the other index too with "USING INDEX grp:Group(name)


On Fri, Jun 6, 2014 at 9:46 PM, Eric Olson <[email protected]> wrote:

> Yes, I also tried USING PERIODIC COMMIT with 10000 and 50000 values.
>
> Yes, as stated I set indexes on the name properties in anticipation of the
> queries.
>
> I was wrong in saying that it 'failed' because it didn't (except when
> running in the web browser and it timed out). What I meant was that it was
> taking enormous amounts of time. Much more time than the other imports if
> scaled linearly. I never did let it finish because I could no longer wait.
> With imports where there is a MATCH statement, should I expect the running
> time to be excessive in relation to imports which simply CREATEs nodes?
>
>
>
> On Friday, June 6, 2014 12:10:54 PM UTC-6, Michael Hunger wrote:
>
>> How did it fail?
>>
>> Did you try USING PERIODIC COMMIT 10000 ?
>>
>> Do you have an index for : :User(name) and :Group(name) ?
>>
>>
>> On Fri, Jun 6, 2014 at 12:34 AM, Eric Olson <[email protected]> wrote:
>>
>>> I have read some other topics on this and am still coming up short on a
>>> satisfying solution.
>>>
>>> I am:
>>>
>>>    - Populating my DB using the new CSV import query in Cypher
>>>    - Using the Neo4j shell
>>>    - Including the "USING PERIODIC COMMIT" statement
>>>
>>> I have:
>>>
>>>    - Successfully imported a 10,000 line file in ~2 seconds
>>>    - Successfully imported a 500,000 line file in ~20 seconds
>>>    - Successfully imported a 5,000,000 line file in ~3 minutes
>>>    - FAILED to import a 100,000,000 line file!
>>>
>>> The first 3 imports were just to create some simple nodes. The failed
>>> import was to create relationships and the statement looks like:
>>>
>>>
>>> USING PERIODIC COMMIT 100000
>>> LOAD CSV WITH HEADERS FROM 'file:/mcpdata/5_usr-grp.csv' AS line
>>> MATCH (usr:User { name: line.user }), (grp:Group { name: line.group })
>>> CREATE (user)-[:IN]->(grp)
>>>
>>>
>>> And yes, I have set indexes on the name properties of each so that they
>>> can be retrieved quickly.
>>>
>>> This has been spinning for well over an hour and still no completion. I
>>> am assuming based on the other timings that it should take about 30 minutes
>>> + query times to retrieve the objects I am making the relationship between.
>>> Is it still the MATCH query that is killing me here? If on average it takes
>>> 10ms for each object retrieval, then with 100M lines (200M total retrievals
>>> then), this could add up to an additional 23 days of running time :)
>>>
>>> IS THERE A BETTER WAY?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] CSV importer chokes on large files creating relations

Reply via email to