Re: [Neo4j] Poor insert performance

Jason W Sat, 28 Dec 2013 22:58:27 -0800

Michael,
I tried out your tool and I love the ease at which I was able to get going. 
Unfortunately, it hasn't really helped my performance issue.


Here's my command:
import-cypher -i input.csv -i output.csv MERGE (a:Attribute {coordinate: 
{coordinate}}) WITH a match (u:User {name: 'jason'}) CREATE UNIQUE 
(u)-[r:HAS_ATTRIBUTE]->(a)

input.csv looks like this:
coordinate
1:1
1:2
1:3
etc..

Running a test with 1000 attributes in input.csv took 230 seconds, which is 
a measley 4.3 inserts per second.




On Saturday, December 28, 2013 11:29:29 PM UTC-6, Jason W wrote:
>
> Michael,
> Thanks for the reply. Your tool looks pretty interesting! Looks like it 
> allows me to user parameters by providing a CSV file of values. I'll give 
> it a try.
>
> To answer your questions..
> I have created a unique index on :Attribute(coordinate). The attributes 
> are simply nodes that need to be connected to the user. Different users 
> will share some of these attributes, and I need to be able to query which 
> ones are shared (or not shared) between various users. I was running by 
> piping the cyper queries to just "neo4j-shell" with a running server. 
> Should I be using the "-file" option?
>
> On Saturday, December 28, 2013 6:51:37 PM UTC-6, Michael Hunger wrote:
>>
>> Jason,
>>
>> usually you would use parameters to speed it up. The shell also supports 
>> parameters, you can use "export param=value"
>>
>> e.g.
>> export key="#{key}"
>> export user="#{user}"
>>
>> MERGE (a:Attribute {coordinate: {key}}) WITH a 
>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a);
>>
>>
>> did you create a unique index for your merge command ? (or at least a 
>> normal index on :Attribute(coordinate)
>>
>> What are the attributes for?
>>
>> Also combining around  20-50k elements in a single tx would speed it up.
>>
>> begin
>>
>> export key="#{key}"
>> export user="#{user}"
>>
>> MERGE (a:Attribute {coordinate: {key}}) WITH a 
>> MATCH (u:User {name: {user}}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a);
>>
>> ...
>> ...
>> ...
>> commit
>>
>> did you try to run bin/neo4j-shell -file file
>>
>> are your running against a running server? or the shell with -path ?
>> You probably want to do the former, so that it can use the memory config 
>> of the running server.
>> Otherwise it might make sense to configure the neo4j-shell script (if you 
>> edit it there is a line like this, add some sensible memory config to it):
>> EXTRA_JVM_ARGUMENTS="-Xmx8G -Xms8G -Xmn1G"
>>
>>
>> For fast imports of csv files with a single cypher statement like yours 
>> perhaps my neo4j-shell import tools would be helpful :)
>> Check it out here: 
>> https://github.com/jexp/neo4j-shell-tools/tree/20#cypher-import
>>
>>
>> HTH
>>
>> Michael
>>
>> Am 29.12.2013 um 00:30 schrieb Jason W <[email protected]>:
>>
>> Hi Everyone,
>> I'm relatively new to neo4j and i'm running into slowness when trying to 
>> insert a batch of data. My strategy has been to write the batch of Cypher 
>> queries to a text file, and then pipe that into the neo4j-shell. Here is a 
>> description of my data.
>>
>> Start with a single "user" node.
>> Create (if not exists) many "attribute" nodes.
>> Create relationships between "user" node and "attribute" nodes.
>>
>> In my benchmarking, I'm creating 10,000 attribute nodes and relationships 
>> from the user to the attributes. The caveat is that the attribute nodes may 
>> already exist, and if it does I want to use the existing one instead of 
>> creating a new one. My current approach uses the MERGE command to create 
>> the attribute nodes (or return the node if it doesn't exist). My Cypher 
>> queries look something like this:
>>
>> MERGE (a:Attribute {coordinate: '#{key}'}) WITH a 
>> MATCH (u:User {name: '#{user}'}) CREATE UNIQUE (u)-[r:HAS_ATTR]->(a)
>>
>> Running 10,000 sequential queries like this to insert my data is quite 
>> slow. I'm getting somewhere around 20 inserts per second. Here are some 
>> things I've tried to optimize:
>>
>> -Batch these into a large transaction in a text file, and pipe it into 
>> the neo4j-shell
>> -Batch these into a large single command in a text file, and pipe into 
>> neo4j-shell
>> -Break into parallel jobs and insert multi threaded. Each query must be a 
>> single transaction otherwise it locks.
>> -Separate the MERGE commands into a batch, and the CREATE relationship 
>> commands into a separate batch
>>
>> I've done the tooling benchmark to test file system performance (
>> http://docs.neo4j.org/chunked/milestone/linux-performance-guide.html) 
>> and my results are great. I should be able to get upwards of 70k 
>> records/sec based on the benchmark.
>>
>> Can anyone advise what is the best strategy to import this type of data 
>> quickly?
>>
>>
>>
>>
>> This message contains confidential information and is intended only for 
>> the individual named. If you are not the named addressee you should not 
>> disseminate, distribute or copy this e-mail. Please notify the sender 
>> immediately by e-mail if you have received this e-mail by mistake and 
>> delete this e-mail from your system. If you are not the intended recipient 
>> you are notified that disclosing, copying, distributing or taking any 
>> action in reliance on the contents of this information is strictly 
>> prohibited.
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>>
>>
>
>
> This message contains confidential information and is intended only for 
> the individual named. If you are not the named addressee you should not 
> disseminate, distribute or copy this e-mail. Please notify the sender 
> immediately by e-mail if you have received this e-mail by mistake and 
> delete this e-mail from your system. If you are not the intended recipient 
> you are notified that disclosing, copying, distributing or taking any 
> action in reliance on the contents of this information is strictly 
> prohibited.


-- 


This message contains confidential information and is intended only for the 
individual named. If you are not the named addressee you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately by e-mail if you have received this e-mail by mistake and 
delete this e-mail from your system. If you are not the intended recipient 
you are notified that disclosing, copying, distributing or taking any 
action in reliance on the contents of this information is strictly 
prohibited.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] Poor insert performance

Reply via email to