Re: [Neo4j] Re: LOAD CSV takes over an hour

Michael Hunger Sat, 28 Jun 2014 02:11:29 -0700

Let me send you my version of your import when I come back home

Sent from mobile device


Am 28.06.2014 um 10:22 schrieb Pavan Kumar <[email protected]>:

> I have changed my constraints and i am sure my labels and identifiers are not 
> same now..
> But still query is executing for long time in log file i can see 
> "Applications threads blocked for" statement..And i am getting the same error
> "GC overhead limit exceeded". Currently i have set to 2 GB in jvm 
> file..Kindly tell me if i am doing any mistake in my cypher statements.
> create constraint on (ChemicalName:chemicalname) assert 
> ChemicalName.chemicalname is unique;
> create constraint on (Chemicalid:chemicalid) assert Chemicalid.chemicalid is 
> unique;
> create constraint on (Genesymb:genesymbol) assert Genesymb.genesymbol is 
> unique;
> create constraint on (GeneID:geneid) assert GeneID.geneid is unique;
> create constraint on (form:geneform) assert form.geneform is unique;
> create constraint on (Interac:interaction) assert Interac.interaction is 
> unique;
> create constraint on (Interactactions:interactionactions) assert 
> Interactactions.interactionactions is unique;
> create constraint on (Pubmedid:pubmed) assert Pubmedid.pubmed is unique;
> create index on :ChemicalName(chemicalname);
> create index on :Chemicalid(chemicalid);
> create index on :Genesymb(genesymbol);
> create index on :GeneID(geneid);
> create index on :form(geneform);
> create index on :Interac(interaction);
> create index on :Interactactions(interactionactions);
> create index on :Pubmedid(pubmed);
> 
> USING PERIODIC COMMIT 1000
> LOAD CSV WITH HEADERS FROM
> "file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
> AS chemgeneinteractions
> match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
> match (genesymbol:Genesymbol{genesymbol: chemgeneinteractions.GeneSymbol})
> merge (chemicalname:ChemicalID{chemicalid: chemgeneinteractions.ChemicalID, 
> chemicalname: chemgeneinteractions.ChemicalName})
> ON CREATE SET 
> chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
> merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
> merge (interations:Interaction{interact: chemgeneinteractions.Interaction})
> merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
> merge (interaction:Interactionactions{interr: 
> chemgeneinteractions.InteractionActions})
> merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
> merge (geneid)-[:Gene_Symbol]->(genesymbol)
> merge (geneid)-[:chemicalname]->(chemicalname)
> merge (geneid)-[:geneform]->(genefrom)
> merge (geneid)-[:Its_interaction_action]->(interactions)
> merge (geneid)-[:Its_interaction]->(interaction)
> merge (geneid)-[:PubmedID]->(pubmed)
> merge (geneid)-[:Related_To]->(organism)
> merge (genesymbol)-[:chemicalname]->(chemicalname)
> merge (genesymbol)-[:geneform]->(genefrom)
> merge (genesymbol)-[:geneform]->(genefrom)
> merge (genesymbol)-[:Its_interaction_action]->(interactions)
> merge (genesymbol)-[:Its_interaction]->(interaction)
> merge (genesymbol)-[:PubmedID]->(pubmed)
> merge (genesymbol)-[:Related_To]->(organism)
> 
> Thanks
> 
> 
> 
> On Thu, Jun 26, 2014 at 6:57 PM, Michael Hunger 
> <[email protected]> wrote:
>> Your constraints are wrong you mixed up labels and identifiers
>> 
>> Please also check the index properties
>> 
>> And I had better success doing a multi-pass for each set of elements to 
>> connect
>> 
>> Sent from mobile device
>> 
>> Am 26.06.2014 um 11:58 schrieb Pavan Kumar <[email protected]>:
>> 
>>> My JVM properties include
>>> 
>>> # Enter one VM parameter per line, note that some parameters can only be 
>>> set once.
>>> # For example, to adjust the maximum memory usage to 512 MB, uncomment the 
>>> following line
>>> -Xmx6144m
>>> Xmx4G -Xms4G -Xmn1G
>>> 
>>>  but still i am getting GC overhead limit exceeded error. (I have tried 
>>> from 512m to 6GB)
>>> My neo4j.propertie file contains
>>> neostore.nodestore.db.mapped_memory=100M
>>> neostore.relationshipstore.db.mapped_memory=2G
>>> neostore.propertystore.db.mapped_memory=200M
>>> neostore.propertystore.db.strings.mapped_memory=200M
>>> neostore.propertystore.db.arrays.mapped_memory=0M
>>> 
>>> Any more suggestions to get rid of the error
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Wed, Jun 25, 2014 at 6:34 PM, Pavan Kumar <[email protected]> 
>>> wrote:
>>>> My property file is 
>>>> 
>>>> # Default values for the low-level graph engine
>>>> 
>>>> neostore.nodestore.db.mapped_memory=100M
>>>> neostore.relationshipstore.db.mapped_memory=2G
>>>> neostore.propertystore.db.mapped_memory=200M
>>>> neostore.propertystore.db.strings.mapped_memory=200M
>>>> neostore.propertystore.db.arrays.mapped_memory=0M
>>>> 
>>>> # Enable this to be able to upgrade a store from an older version
>>>> allow_store_upgrade=true
>>>> 
>>>> # Enable this to specify a parser other than the default one.
>>>> #cypher_parser_version=2.0
>>>> 
>>>> # Keep logical logs, helps debugging but uses more disk space, enabled for
>>>> # legacy reasons To limit space needed to store historical logs use values 
>>>> such
>>>> # as: "7 days" or "100M size" instead of "true"
>>>> keep_logical_logs=true
>>>> 
>>>> # Autoindexing
>>>> 
>>>> # Enable auto-indexing for nodes, default is false
>>>> #node_auto_indexing=true
>>>> 
>>>> # The node property keys to be auto-indexed, if enabled
>>>> #node_keys_indexable=name,age
>>>> 
>>>> # Enable auto-indexing for relationships, default is false
>>>> #relationship_auto_indexing=true
>>>> 
>>>> # The relationship property keys to be auto-indexed, if enabled
>>>> #relationship_keys_indexable=name,age
>>>> cache_type=strong
>>>> 
>>>> 
>>>> 
>>>> My jvm options are 
>>>> # Enter one VM parameter per line, note that some parameters can only be 
>>>> set once.
>>>> # For example, to adjust the maximum memory usage to 512 MB, uncomment the 
>>>> following line
>>>> -Xmx512m
>>>> 
>>>> But still i am getting GC overhead limit exceeded error
>>>> Kindly somebody suggest me
>>>> 
>>>> On Wed, Jun 25, 2014 at 5:01 PM, Michael Hunger 
>>>> <[email protected]> wrote:
>>>>> use 1000 here
>>>>> USING PERIODIC COMMIT 1000
>>>>> 
>>>>> Increase the memory settings to 6G
>>>>> 
>>>>> As you run on windows:
>>>>> 
>>>>>> neostore.nodestore.db.mapped_memory=100M
>>>>>> neostore.relationshipstore.db.mapped_memory=2G
>>>>>> neostore.propertystore.db.mapped_memory=200M
>>>>>> neostore.propertystore.db.strings.mapped_memory=200M
>>>>>> neostore.propertystore.db.arrays.mapped_memory=0M
>>>>> 
>>>>> also can you share your complete CSV with me privately?
>>>>> 
>>>>> Do you have any nodes in your dataset that have many (100k-1M) of 
>>>>> relationships?
>>>>> 
>>>>> On Wed, Jun 25, 2014 at 1:23 PM, Pavan Kumar <[email protected]> 
>>>>> wrote:
>>>>>> Hi , 
>>>>>> My qiuery is as follows
>>>>>> create constraint on (ChemicalName:chemicalname) assert 
>>>>>> ChemicalName.chemicalname is unique;
>>>>>> create constraint on (ChemicalID:chemicalid) assert 
>>>>>> ChemicalID.chemicalid is unique;
>>>>>> create constraint on (Genesymbol:genesymbol) assert 
>>>>>> Genesymbol.genesymbol is unique;
>>>>>> create constraint on (Geneid:geneid) assert Geneid.geneid is unique;
>>>>>> create constraint on (Geneform:geneform) assert Geneform.geneform is 
>>>>>> unique;
>>>>>> create constraint on (Interaction:interaction) assert 
>>>>>> Interaction.interaction is unique;
>>>>>> create constraint on (Interactionactions:interactionactions) assert 
>>>>>> Interactionactions.interactionactions is unique;
>>>>>> create constraint on (PubmedID:pubmed) assert PubmedID.pubmed is unique;
>>>>>> create index on :ChemicalName(chemicalname);
>>>>>> create index on :ChemicalID(chemicalid);
>>>>>> create index on :Genesymbol(genesymbol);
>>>>>> create index on :Geneid(geneid);
>>>>>> create index on :Geneform(geneform);
>>>>>> create index on :Interaction(interaction);
>>>>>> create index on :Interactionactions(interactionactions);
>>>>>> create index on :PubmedID(pubmed);
>>>>>> 
>>>>>> USING PERIODIC COMMIT 10000
>>>>>> LOAD CSV WITH HEADERS FROM
>>>>>> "file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
>>>>>> AS chemgeneinteractions
>>>>>> match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
>>>>>> match (genesymbol:Genesymbol{genesymbol: 
>>>>>> chemgeneinteractions.GeneSymbol})
>>>>>> merge (chemicalname:ChemicalID{chemicalid: 
>>>>>> chemgeneinteractions.ChemicalID, chemicalname: 
>>>>>> chemgeneinteractions.ChemicalName})
>>>>>> ON CREATE SET 
>>>>>> chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
>>>>>> merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
>>>>>> merge (interations:Interaction{interact: 
>>>>>> chemgeneinteractions.Interaction})
>>>>>> merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
>>>>>> merge (interaction:Interactionactions{interr: 
>>>>>> chemgeneinteractions.InteractionActions})
>>>>>> merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
>>>>>> merge (geneid)-[:Gene_Symbol]->(genesymbol)
>>>>>> merge (geneid)-[:chemicalname]->(chemicalname)
>>>>>> merge (geneid)-[:geneform]->(genefrom)
>>>>>> merge (geneid)-[:Its_interaction_action]->(interactions)
>>>>>> merge (geneid)-[:Its_interaction]->(interaction)
>>>>>> merge (geneid)-[:PubmedID]->(pubmed)
>>>>>> merge (geneid)-[:Related_To]->(organism)
>>>>>> merge (genesymbol)-[:chemicalname]->(chemicalname)
>>>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>>>> merge (genesymbol)-[:Its_interaction_action]->(interactions)
>>>>>> merge (genesymbol)-[:Its_interaction]->(interaction)
>>>>>> merge (genesymbol)-[:PubmedID]->(pubmed)
>>>>>> merge (genesymbol)-[:Related_To]->(organism)
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> My jvm properties are 
>>>>>> -Xmx512m
>>>>>> -XX:+UseConcMarkSweepGC
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> On Wed, Jun 25, 2014 at 4:49 PM, Michael Hunger 
>>>>>> <[email protected]> wrote:
>>>>>>> Please read this blog post: 
>>>>>>> http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
>>>>>>> 
>>>>>>> And yes you should use more memory than 512 byte.
>>>>>>> 
>>>>>>> -Xmns4G -Xmx4G -Xmn1G 
>>>>>>> 
>>>>>>> 
>>>>>>> On Wed, Jun 25, 2014 at 1:17 PM, Michael Hunger 
>>>>>>> <[email protected]> wrote:
>>>>>>>> What does your query look like?
>>>>>>>> Please switch to Neo4j 2.1.2
>>>>>>>> 
>>>>>>>> And create indexes / constraints for the nodes you're inserting with 
>>>>>>>> merge or looking up via MATCH.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]>:
>>>>>>>> 
>>>>>>>>> Hi,
>>>>>>>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am 
>>>>>>>>> trying to import CSV file which has 30000 records. I am using USING 
>>>>>>>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives 
>>>>>>>>> unknown error. I have modified neo4j.properties file as adviced in 
>>>>>>>>> the blogs. My neo4j.properties now looks like 
>>>>>>>>> # Default values for the low-level graph engine
>>>>>>>>> 
>>>>>>>>> neostore.nodestore.db.mapped_memory=200M
>>>>>>>>> neostore.relationshipstore.db.mapped_memory=4G
>>>>>>>>> neostore.propertystore.db.mapped_memory=500M
>>>>>>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>>>>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>>>>>> 
>>>>>>>>> # Enable this to be able to upgrade a store from an older version
>>>>>>>>> allow_store_upgrade=true
>>>>>>>>> 
>>>>>>>>> # Enable this to specify a parser other than the default one.
>>>>>>>>> #cypher_parser_version=2.0
>>>>>>>>> 
>>>>>>>>> # Keep logical logs, helps debugging but uses more disk space, 
>>>>>>>>> enabled for
>>>>>>>>> # legacy reasons To limit space needed to store historical logs use 
>>>>>>>>> values such
>>>>>>>>> # as: "7 days" or "100M size" instead of "true"
>>>>>>>>> keep_logical_logs=true
>>>>>>>>> 
>>>>>>>>> # Autoindexing
>>>>>>>>> 
>>>>>>>>> # Enable auto-indexing for nodes, default is false
>>>>>>>>> node_auto_indexing=true
>>>>>>>>> 
>>>>>>>>> # The node property keys to be auto-indexed, if enabled
>>>>>>>>> #node_keys_indexable=name,age
>>>>>>>>> 
>>>>>>>>> # Enable auto-indexing for relationships, default is false
>>>>>>>>> relationship_auto_indexing=true
>>>>>>>>> 
>>>>>>>>> # The relationship property keys to be auto-indexed, if enabled
>>>>>>>>> #relationship_keys_indexable=name,age
>>>>>>>>> 
>>>>>>>>> # Setting for Community Edition:
>>>>>>>>> cache_type=weak
>>>>>>>>> 
>>>>>>>>> Still i am facing the same problem. Is there any other file to change 
>>>>>>>>> properties. Kindly help me in this issue.
>>>>>>>>> Thanks in advance
>>>>>>>>> 
>>>>>>>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>>>>>>> 
>>>>>>>>>> Hi,
>>>>>>>>>> 
>>>>>>>>>> I was asked to post this here by Mark Needham (@markhneedham) who 
>>>>>>>>>> thought my query took longer than it should.
>>>>>>>>>> 
>>>>>>>>>> I'm trying to see how graph databases could be used in investigative 
>>>>>>>>>> journalism: I was loading in New York State's Active Corporations: 
>>>>>>>>>> Beginning 1800 data from 
>>>>>>>>>> https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6
>>>>>>>>>>  as a 1964486-row csv (and deleted all U+F8FF characters, because I 
>>>>>>>>>> was getting "[null] is not a supported property value"). The Cypher 
>>>>>>>>>> query I used was 
>>>>>>>>>> 
>>>>>>>>>> USING PERIODIC COMMIT 500
>>>>>>>>>> LOAD CSV
>>>>>>>>>>   FROM 
>>>>>>>>>> "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv"
>>>>>>>>>>   AS company
>>>>>>>>>> CREATE (:DataActiveCorporations
>>>>>>>>>>      {
>>>>>>>>>>              DOS_ID:company[0],
>>>>>>>>>>              Current_Entity_Name:company[1],
>>>>>>>>>>              Initial_DOS_Filing_Date:company[2],
>>>>>>>>>>              County:company[3],
>>>>>>>>>>              Jurisdiction:company[4],
>>>>>>>>>>              Entity_Type:company[5],
>>>>>>>>>> 
>>>>>>>>>>              DOS_Process_Name:company[6],
>>>>>>>>>>              DOS_Process_Address_1:company[7],
>>>>>>>>>>              DOS_Process_Address_2:company[8],
>>>>>>>>>>              DOS_Process_City:company[9],
>>>>>>>>>>              DOS_Process_State:company[10],
>>>>>>>>>>              DOS_Process_Zip:company[11],
>>>>>>>>>> 
>>>>>>>>>>              CEO_Name:company[12],
>>>>>>>>>>              CEO_Address_1:company[13],
>>>>>>>>>>              CEO_Address_2:company[14],
>>>>>>>>>>              CEO_City:company[15],
>>>>>>>>>>              CEO_State:company[16],
>>>>>>>>>>              CEO_Zip:company[17],
>>>>>>>>>> 
>>>>>>>>>>              Registered_Agent_Name:company[18],
>>>>>>>>>>              Registered_Agent_Address_1:company[19],
>>>>>>>>>>              Registered_Agent_Address_2:company[20],
>>>>>>>>>>              Registered_Agent_City:company[21],
>>>>>>>>>>              Registered_Agent_State:company[22],
>>>>>>>>>>              Registered_Agent_Zip:company[23],
>>>>>>>>>> 
>>>>>>>>>>              Location_Name:company[24],
>>>>>>>>>>              Location_Address_1:company[25],
>>>>>>>>>>              Location_Address_2:company[26],
>>>>>>>>>>              Location_City:company[27],
>>>>>>>>>>              Location_State:company[28],
>>>>>>>>>>              Location_Zip:company[29]
>>>>>>>>>>      }
>>>>>>>>>> );
>>>>>>>>>> 
>>>>>>>>>> Each row is one node so it's as close to the raw data as possible. 
>>>>>>>>>> The idea is loosely that these nodes will be linked with new nodes 
>>>>>>>>>> representing people and addresses verified by reporters.
>>>>>>>>>> 
>>>>>>>>>> This is what I got:
>>>>>>>>>> 
>>>>>>>>>> +-------------------+
>>>>>>>>>> | No data returned. |
>>>>>>>>>> +-------------------+
>>>>>>>>>> Nodes created: 1964486
>>>>>>>>>> Properties set: 58934580
>>>>>>>>>> Labels added: 1964486
>>>>>>>>>> 4550855 ms
>>>>>>>>>> 
>>>>>>>>>> Some context information: 
>>>>>>>>>> Neo4j Milestone Release 2.1.0-M01
>>>>>>>>>> Windows 7
>>>>>>>>>> java version "1.7.0_03"
>>>>>>>>>> 
>>>>>>>>>> Best,
>>>>>>>>>> Aram
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Neo4j" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>> 
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to a topic in the 
>>>>>>> Google Groups "Neo4j" group.
>>>>>>> To unsubscribe from this topic, visit 
>>>>>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>>>> [email protected].
>>>>>>> 
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> -- 
>>>>>> Thanks & Regards,
>>>>>> Pavan Kumar
>>>>>> Project Engineer
>>>>>> CDAC -KP
>>>>>> Ph +91-7676367646
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>> an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "Neo4j" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Thanks & Regards,
>>>> Pavan Kumar
>>>> Project Engineer
>>>> CDAC -KP
>>>> Ph +91-7676367646
>>> 
>>> 
>>> 
>>> -- 
>>> Thanks & Regards,
>>> Pavan Kumar
>>> Project Engineer
>>> CDAC -KP
>>> Ph +91-7676367646
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> -- 
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV takes over an hour

Reply via email to