Can you show the "profile" output of the neo4j-shell of your import of a tiny variant?
e.g. your 10k file? I could imagine it only uses one index by default and you have to force cypher to use the other index too with "USING INDEX grp:Group(name) On Fri, Jun 6, 2014 at 9:46 PM, Eric Olson <[email protected]> wrote: > Yes, I also tried USING PERIODIC COMMIT with 10000 and 50000 values. > > Yes, as stated I set indexes on the name properties in anticipation of the > queries. > > I was wrong in saying that it 'failed' because it didn't (except when > running in the web browser and it timed out). What I meant was that it was > taking enormous amounts of time. Much more time than the other imports if > scaled linearly. I never did let it finish because I could no longer wait. > With imports where there is a MATCH statement, should I expect the running > time to be excessive in relation to imports which simply CREATEs nodes? > > > > On Friday, June 6, 2014 12:10:54 PM UTC-6, Michael Hunger wrote: > >> How did it fail? >> >> Did you try USING PERIODIC COMMIT 10000 ? >> >> Do you have an index for : :User(name) and :Group(name) ? >> >> >> On Fri, Jun 6, 2014 at 12:34 AM, Eric Olson <[email protected]> wrote: >> >>> I have read some other topics on this and am still coming up short on a >>> satisfying solution. >>> >>> I am: >>> >>> - Populating my DB using the new CSV import query in Cypher >>> - Using the Neo4j shell >>> - Including the "USING PERIODIC COMMIT" statement >>> >>> I have: >>> >>> - Successfully imported a 10,000 line file in ~2 seconds >>> - Successfully imported a 500,000 line file in ~20 seconds >>> - Successfully imported a 5,000,000 line file in ~3 minutes >>> - FAILED to import a 100,000,000 line file! >>> >>> The first 3 imports were just to create some simple nodes. The failed >>> import was to create relationships and the statement looks like: >>> >>> >>> USING PERIODIC COMMIT 100000 >>> LOAD CSV WITH HEADERS FROM 'file:/mcpdata/5_usr-grp.csv' AS line >>> MATCH (usr:User { name: line.user }), (grp:Group { name: line.group }) >>> CREATE (user)-[:IN]->(grp) >>> >>> >>> And yes, I have set indexes on the name properties of each so that they >>> can be retrieved quickly. >>> >>> This has been spinning for well over an hour and still no completion. I >>> am assuming based on the other timings that it should take about 30 minutes >>> + query times to retrieve the objects I am making the relationship between. >>> Is it still the MATCH query that is killing me here? If on average it takes >>> 10ms for each object retrieval, then with 100M lines (200M total retrievals >>> then), this could add up to an additional 23 days of running time :) >>> >>> IS THERE A BETTER WAY? >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Neo4j" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> >>> For more options, visit https://groups.google.com/d/optout. >>> >> >> -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
