Do you run it against server or with -path? If the latter please remember to set the memory options in the shell script
I'll try your example later today Sent from mobile device Am 29.12.2013 um 10:55 schrieb Jason W <[email protected]>: > Just realized I didn't have the index set on the right property. Doh! > > After adding the index, I was able to insert a batch of 1,000 in 3.2 seconds > which feels much better. When trying a larger batch though, the performance > does not scale linearly - 25,000 batch took almost 15 minutes. I can clearly > see disk writes and garbage collection playing a role now, so i'm playing > around with batch sizes now. I'm on a linux server with 64GB available > memory, 64 cores, and software RAID 10 over 4 x 7,200 RPM disks. I'm using > default settings on neo4j. > > Any tuning advice would be greatly appreciated! > > On Sunday, December 29, 2013 12:58:16 AM UTC-6, Jason W wrote: >> >> Michael, >> I tried out your tool and I love the ease at which I was able to get going. >> Unfortunately, it hasn't really helped my performance issue. >> >> Here's my command: >> import-cypher -i input.csv -i output.csv MERGE (a:Attribute {coordinate: >> {coordinate}}) WITH a match (u:User {name: 'jason'}) CREATE UNIQUE >> (u)-[r:HAS_ATTRIBUTE]->(a) >> >> input.csv looks like this: >> coordinate >> 1:1 >> 1:2 >> 1:3 >> etc.. >> >> Running a test with 1000 attributes in input.csv took 230 seconds, which is >> a measley 4.3 inserts per second. >> >> >> >> >> On Saturday, December 28, 2013 11:29:29 PM UTC-6, Jason W wrote: >>> >>> Michael, >>> Thanks for the reply. Your tool looks pretty interesting! Looks like it >>> allows me to user parameters by providing a CSV file of values. I'll give >>> it a try. >>> >>> To answer your questions.. >>> I have created a unique index on :Attribute(coordinate). The attributes are >>> simply nodes that need to be connected to the user. Different users will >>> share some of these attributes, and I need to be able to query which ones >>> are shared (or not shared) between various users. I was running by piping >>> the cyper queries to just "neo4j-shell" with a running server. Should I be >>> using the "-file" option? >>> >>> On Saturday, December 28, 2013 6:51:37 PM UTC-6, Michael Hunger wrote: >>>> >>>> Jason, >>>> >>>> usually you would use parameters to speed it up. The shell also supports >>>> parameters, you can use "export param=value" >>>> >>>> e.g. >>>> export key="#{key}" >>>> export user="#{user}" >>>>> MERGE (a:Attribute {coordinate: {key}}) WITH a >>>>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a); >>>> >>>> >>>> did you create a unique index for your merge command ? (or at least a >>>> normal index on :Attribute(coordinate) >>>> >>>> What are the attributes for? >>>> >>>> Also combining around 20-50k elements in a single tx would speed it up. >>>> >>>> begin >>>> >>>> export key="#{key}" >>>> export user="#{user}" >>>>> MERGE (a:Attribute {coordinate: {key}}) WITH a >>>>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a); >>>> >>>> ... >>>> ... >>>> ... >>>> commit >>>> >>>> did you try to run bin/neo4j-shell -file file >>>> >>>> are your running against a running server? or the shell with -path ? >>>> You probably want to do the former, so that it can use the memory config >>>> of the running server. >>>> Otherwise it might make sense to configure the neo4j-shell script (if you >>>> edit it there is a line like this, add some sensible memory config to it): >>>> EXTRA_JVM_ARGUMENTS="-Xmx8G -Xms8G -Xmn1G" >>>> >>>> >>>> For fast imports of csv files with a single cypher statement like yours >>>> perhaps my neo4j-shell import tools would be helpful :) >>>> Check it out here: >>>> https://github.com/jexp/neo4j-shell-tools/tree/20#cypher-import >>>> >>>> >>>> HTH >>>> >>>> Michael >>>> >>>> Am 29.12.2013 um 00:30 schrieb Jason W <[email protected]>: >>>> >>>>> Hi Everyone, >>>>> I'm relatively new to neo4j and i'm running into slowness when trying to >>>>> insert a batch of data. My strategy has been to write the batch of Cypher >>>>> queries to a text file, and then pipe that into the neo4j-shell. Here is >>>>> a description of my data. >>>>> >>>>> Start with a single "user" node. >>>>> Create (if not exists) many "attribute" nodes. >>>>> Create relationships between "user" node and "attribute" nodes. >>>>> >>>>> In my benchmarking, I'm creating 10,000 attribute nodes and relationships >>>>> from the user to the attributes. The caveat is that the attribute nodes >>>>> may already exist, and if it does I want to use the existing one instead >>>>> of creating a new one. My current approach uses the MERGE command to >>>>> create the attribute nodes (or return the node if it doesn't exist). My >>>>> Cypher queries look something like this: >>>>> >>>>> MERGE (a:Attribute {coordinate: '#{key}'}) WITH a >>>>> MATCH (u:User {name: '#{user}'}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a) >>>>> >>>>> Running 10,000 sequential queries like this to insert my data is quite >>>>> slow. I'm getting somewhere around 20 inserts per second. Here are some >>>>> things I've tried to optimize: >>>>> >>>>> -Batch these into a large transaction in a text file, and pipe it into >>>>> the neo4j-shell >>>>> -Batch these into a large single command in a text file, and pipe into >>>>> neo4j-shell >>>>> -Break into parallel jobs and insert multi threaded. Each query must be a >>>>> single transaction otherwise it locks. >>>>> -Separate the MERGE commands into a batch, and the CREATE relationship >>>>> commands into a separate batch >>>>> >>>>> I've done the tooling benchmark to test file system performance >>>>> (http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html) >>>>> and my results are great. I should be able to get upwards of 70k >>>>> records/sec based on the benchmark. >>>>> >>>>> Can anyone advise what is the best strategy to import this type of data >>>>> quickly? >>>>> >>>>> >>>>> >>>>> >>>>> This message contains confidential information and is intended only for >>>>> the individual named. If you are not the named addressee you should not >>>>> disseminate, distribute or copy this e-mail. Please notify the sender >>>>> immediately by e-mail if you have received this e-mail by mistake and >>>>> delete this e-mail from your system. If you are not the intended >>>>> recipient you are notified that disclosing, copying, distributing or >>>>> taking any action in reliance on the contents of this information is >>>>> strictly prohibited. >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google Groups >>>>> "Neo4j" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send an >>>>> email to [email protected]. >>>>> For more options, visit https://groups.google.com/groups/opt_out. >>> >>> >>> >>> This message contains confidential information and is intended only for the >>> individual named. If you are not the named addressee you should not >>> disseminate, distribute or copy this e-mail. Please notify the sender >>> immediately by e-mail if you have received this e-mail by mistake and >>> delete this e-mail from your system. If you are not the intended recipient >>> you are notified that disclosing, copying, distributing or taking any >>> action in reliance on the contents of this information is strictly >>> prohibited. >> >> >> >> This message contains confidential information and is intended only for the >> individual named. If you are not the named addressee you should not >> disseminate, distribute or copy this e-mail. Please notify the sender >> immediately by e-mail if you have received this e-mail by mistake and delete >> this e-mail from your system. If you are not the intended recipient you are >> notified that disclosing, copying, distributing or taking any action in >> reliance on the contents of this information is strictly prohibited. > > > > This message contains confidential information and is intended only for the > individual named. If you are not the named addressee you should not > disseminate, distribute or copy this e-mail. Please notify the sender > immediately by e-mail if you have received this e-mail by mistake and delete > this e-mail from your system. If you are not the intended recipient you are > notified that disclosing, copying, distributing or taking any action in > reliance on the contents of this information is strictly prohibited. > -- > You received this message because you are subscribed to the Google Groups > "Neo4j" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
