I have read some other topics on this and am still coming up short on a
satisfying solution.
I am:
- Populating my DB using the new CSV import query in Cypher
- Using the Neo4j shell
- Including the "USING PERIODIC COMMIT" statement
I have:
- Successfully imported a 10,000 line file in ~2 seconds
- Successfully imported a 500,000 line file in ~20 seconds
- Successfully imported a 5,000,000 line file in ~3 minutes
- FAILED to import a 100,000,000 line file!
The first 3 imports were just to create some simple nodes. The failed
import was to create relationships and the statement looks like:
USING PERIODIC COMMIT 100000
LOAD CSV WITH HEADERS FROM 'file:/mcpdata/5_usr-grp.csv' AS line
MATCH (usr:User { name: line.user }), (grp:Group { name: line.group })
CREATE (user)-[:IN]->(grp)
And yes, I have set indexes on the name properties of each so that they can
be retrieved quickly.
This has been spinning for well over an hour and still no completion. I am
assuming based on the other timings that it should take about 30 minutes +
query times to retrieve the objects I am making the relationship between.
Is it still the MATCH query that is killing me here? If on average it takes
10ms for each object retrieval, then with 100M lines (200M total retrievals
then), this could add up to an additional 23 days of running time :)
IS THERE A BETTER WAY?
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.