Re: [Neo4j] Poor insert performance

Michael Hunger Sun, 29 Dec 2013 02:16:34 -0800

Do you run it against server or with -path?

If the latter please remember to set the memory options in the shell script


I'll try your example later today

Sent from mobile device

Am 29.12.2013 um 10:55 schrieb Jason W <[email protected]>:

> Just realized I didn't have the index set on the right property. Doh!
> 
> After adding the index, I was able to insert a batch of 1,000 in 3.2 seconds 
> which feels much better. When trying a larger batch though, the performance 
> does not scale linearly - 25,000 batch took almost 15 minutes. I can clearly 
> see disk writes and garbage collection playing a role now, so i'm playing 
> around with batch sizes now. I'm on a linux server with 64GB available 
> memory, 64 cores, and software RAID 10 over 4 x 7,200 RPM disks. I'm using 
> default settings on neo4j.
> 
> Any tuning advice would be greatly appreciated!
> 
> On Sunday, December 29, 2013 12:58:16 AM UTC-6, Jason W wrote:
>> 
>> Michael,
>> I tried out your tool and I love the ease at which I was able to get going. 
>> Unfortunately, it hasn't really helped my performance issue.
>> 
>> Here's my command:
>> import-cypher -i input.csv -i output.csv MERGE (a:Attribute {coordinate: 
>> {coordinate}}) WITH a match (u:User {name: 'jason'}) CREATE UNIQUE 
>> (u)-[r:HAS_ATTRIBUTE]->(a)
>> 
>> input.csv looks like this:
>> coordinate
>> 1:1
>> 1:2
>> 1:3
>> etc..
>> 
>> Running a test with 1000 attributes in input.csv took 230 seconds, which is 
>> a measley 4.3 inserts per second.
>> 
>> 
>> 
>> 
>> On Saturday, December 28, 2013 11:29:29 PM UTC-6, Jason W wrote:
>>> 
>>> Michael,
>>> Thanks for the reply. Your tool looks pretty interesting! Looks like it 
>>> allows me to user parameters by providing a CSV file of values. I'll give 
>>> it a try.
>>> 
>>> To answer your questions..
>>> I have created a unique index on :Attribute(coordinate). The attributes are 
>>> simply nodes that need to be connected to the user. Different users will 
>>> share some of these attributes, and I need to be able to query which ones 
>>> are shared (or not shared) between various users. I was running by piping 
>>> the cyper queries to just "neo4j-shell" with a running server. Should I be 
>>> using the "-file" option?
>>> 
>>> On Saturday, December 28, 2013 6:51:37 PM UTC-6, Michael Hunger wrote:
>>>> 
>>>> Jason,
>>>> 
>>>> usually you would use parameters to speed it up. The shell also supports 
>>>> parameters, you can use "export param=value"
>>>> 
>>>> e.g.
>>>> export key="#{key}"
>>>> export user="#{user}"
>>>>> MERGE (a:Attribute {coordinate: {key}}) WITH a 
>>>>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a);
>>>> 
>>>> 
>>>> did you create a unique index for your merge command ? (or at least a 
>>>> normal index on :Attribute(coordinate)
>>>> 
>>>> What are the attributes for?
>>>> 
>>>> Also combining around  20-50k elements in a single tx would speed it up.
>>>> 
>>>> begin
>>>> 
>>>> export key="#{key}"
>>>> export user="#{user}"
>>>>> MERGE (a:Attribute {coordinate: {key}}) WITH a 
>>>>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a);
>>>> 
>>>> ...
>>>> ...
>>>> ...
>>>> commit
>>>> 
>>>> did you try to run bin/neo4j-shell -file file
>>>> 
>>>> are your running against a running server? or the shell with -path ?
>>>> You probably want to do the former, so that it can use the memory config 
>>>> of the running server.
>>>> Otherwise it might make sense to configure the neo4j-shell script (if you 
>>>> edit it there is a line like this, add some sensible memory config to it):
>>>> EXTRA_JVM_ARGUMENTS="-Xmx8G -Xms8G -Xmn1G"
>>>> 
>>>> 
>>>> For fast imports of csv files with a single cypher statement like yours 
>>>> perhaps my neo4j-shell import tools would be helpful :)
>>>> Check it out here: 
>>>> https://github.com/jexp/neo4j-shell-tools/tree/20#cypher-import
>>>> 
>>>> 
>>>> HTH
>>>> 
>>>> Michael
>>>> 
>>>> Am 29.12.2013 um 00:30 schrieb Jason W <[email protected]>:
>>>> 
>>>>> Hi Everyone,
>>>>> I'm relatively new to neo4j and i'm running into slowness when trying to 
>>>>> insert a batch of data. My strategy has been to write the batch of Cypher 
>>>>> queries to a text file, and then pipe that into the neo4j-shell. Here is 
>>>>> a description of my data.
>>>>> 
>>>>> Start with a single "user" node.
>>>>> Create (if not exists) many "attribute" nodes.
>>>>> Create relationships between "user" node and "attribute" nodes.
>>>>> 
>>>>> In my benchmarking, I'm creating 10,000 attribute nodes and relationships 
>>>>> from the user to the attributes. The caveat is that the attribute nodes 
>>>>> may already exist, and if it does I want to use the existing one instead 
>>>>> of creating a new one. My current approach uses the MERGE command to 
>>>>> create the attribute nodes (or return the node if it doesn't exist). My 
>>>>> Cypher queries look something like this:
>>>>> 
>>>>> MERGE (a:Attribute {coordinate: '#{key}'}) WITH a 
>>>>> MATCH (u:User {name: '#{user}'}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a)
>>>>> 
>>>>> Running 10,000 sequential queries like this to insert my data is quite 
>>>>> slow. I'm getting somewhere around 20 inserts per second. Here are some 
>>>>> things I've tried to optimize:
>>>>> 
>>>>> -Batch these into a large transaction in a text file, and pipe it into 
>>>>> the neo4j-shell
>>>>> -Batch these into a large single command in a text file, and pipe into 
>>>>> neo4j-shell
>>>>> -Break into parallel jobs and insert multi threaded. Each query must be a 
>>>>> single transaction otherwise it locks.
>>>>> -Separate the MERGE commands into a batch, and the CREATE relationship 
>>>>> commands into a separate batch
>>>>> 
>>>>> I've done the tooling benchmark to test file system performance 
>>>>> (http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html) 
>>>>> and my results are great. I should be able to get upwards of 70k 
>>>>> records/sec based on the benchmark.
>>>>> 
>>>>> Can anyone advise what is the best strategy to import this type of data 
>>>>> quickly?
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> This message contains confidential information and is intended only for 
>>>>> the individual named. If you are not the named addressee you should not 
>>>>> disseminate, distribute or copy this e-mail. Please notify the sender 
>>>>> immediately by e-mail if you have received this e-mail by mistake and 
>>>>> delete this e-mail from your system. If you are not the intended 
>>>>> recipient you are notified that disclosing, copying, distributing or 
>>>>> taking any action in reliance on the contents of this information is 
>>>>> strictly prohibited.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google Groups 
>>>>> "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>>> email to [email protected].
>>>>> For more options, visit https://groups.google.com/groups/opt_out.
>>> 
>>> 
>>> 
>>> This message contains confidential information and is intended only for the 
>>> individual named. If you are not the named addressee you should not 
>>> disseminate, distribute or copy this e-mail. Please notify the sender 
>>> immediately by e-mail if you have received this e-mail by mistake and 
>>> delete this e-mail from your system. If you are not the intended recipient 
>>> you are notified that disclosing, copying, distributing or taking any 
>>> action in reliance on the contents of this information is strictly 
>>> prohibited.
>> 
>> 
>> 
>> This message contains confidential information and is intended only for the 
>> individual named. If you are not the named addressee you should not 
>> disseminate, distribute or copy this e-mail. Please notify the sender 
>> immediately by e-mail if you have received this e-mail by mistake and delete 
>> this e-mail from your system. If you are not the intended recipient you are 
>> notified that disclosing, copying, distributing or taking any action in 
>> reliance on the contents of this information is strictly prohibited.
> 
> 
> 
> This message contains confidential information and is intended only for the 
> individual named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and delete 
> this e-mail from your system. If you are not the intended recipient you are 
> notified that disclosing, copying, distributing or taking any action in 
> reliance on the contents of this information is strictly prohibited.
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Poor insert performance

Reply via email to