Re: [Neo4j] Re: LOAD CSV takes over an hour

Michael Hunger Thu, 26 Jun 2014 06:28:33 -0700

Your constraints are wrong you mixed up labels and identifiers

Please also check the index properties


And I had better success doing a multi-pass for each set of elements to connect

Sent from mobile device

Am 26.06.2014 um 11:58 schrieb Pavan Kumar <[email protected]>:

> My JVM properties include
> 
> # Enter one VM parameter per line, note that some parameters can only be set 
> once.
> # For example, to adjust the maximum memory usage to 512 MB, uncomment the 
> following line
> -Xmx6144m
> Xmx4G -Xms4G -Xmn1G
> 
>  but still i am getting GC overhead limit exceeded error. (I have tried from 
> 512m to 6GB)
> My neo4j.propertie file contains
> neostore.nodestore.db.mapped_memory=100M
> neostore.relationshipstore.db.mapped_memory=2G
> neostore.propertystore.db.mapped_memory=200M
> neostore.propertystore.db.strings.mapped_memory=200M
> neostore.propertystore.db.arrays.mapped_memory=0M
> 
> Any more suggestions to get rid of the error
> 
> 
> 
> 
> 
> 
> On Wed, Jun 25, 2014 at 6:34 PM, Pavan Kumar <[email protected]> wrote:
>> My property file is 
>> 
>> # Default values for the low-level graph engine
>> 
>> neostore.nodestore.db.mapped_memory=100M
>> neostore.relationshipstore.db.mapped_memory=2G
>> neostore.propertystore.db.mapped_memory=200M
>> neostore.propertystore.db.strings.mapped_memory=200M
>> neostore.propertystore.db.arrays.mapped_memory=0M
>> 
>> # Enable this to be able to upgrade a store from an older version
>> allow_store_upgrade=true
>> 
>> # Enable this to specify a parser other than the default one.
>> #cypher_parser_version=2.0
>> 
>> # Keep logical logs, helps debugging but uses more disk space, enabled for
>> # legacy reasons To limit space needed to store historical logs use values 
>> such
>> # as: "7 days" or "100M size" instead of "true"
>> keep_logical_logs=true
>> 
>> # Autoindexing
>> 
>> # Enable auto-indexing for nodes, default is false
>> #node_auto_indexing=true
>> 
>> # The node property keys to be auto-indexed, if enabled
>> #node_keys_indexable=name,age
>> 
>> # Enable auto-indexing for relationships, default is false
>> #relationship_auto_indexing=true
>> 
>> # The relationship property keys to be auto-indexed, if enabled
>> #relationship_keys_indexable=name,age
>> cache_type=strong
>> 
>> 
>> 
>> My jvm options are 
>> # Enter one VM parameter per line, note that some parameters can only be set 
>> once.
>> # For example, to adjust the maximum memory usage to 512 MB, uncomment the 
>> following line
>> -Xmx512m
>> 
>> But still i am getting GC overhead limit exceeded error
>> Kindly somebody suggest me
>> 
>> On Wed, Jun 25, 2014 at 5:01 PM, Michael Hunger 
>> <[email protected]> wrote:
>>> use 1000 here
>>> USING PERIODIC COMMIT 1000
>>> 
>>> Increase the memory settings to 6G
>>> 
>>> As you run on windows:
>>> 
>>>> neostore.nodestore.db.mapped_memory=100M
>>>> neostore.relationshipstore.db.mapped_memory=2G
>>>> neostore.propertystore.db.mapped_memory=200M
>>>> neostore.propertystore.db.strings.mapped_memory=200M
>>>> neostore.propertystore.db.arrays.mapped_memory=0M
>>> 
>>> also can you share your complete CSV with me privately?
>>> 
>>> Do you have any nodes in your dataset that have many (100k-1M) of 
>>> relationships?
>>> 
>>> On Wed, Jun 25, 2014 at 1:23 PM, Pavan Kumar <[email protected]> 
>>> wrote:
>>>> Hi , 
>>>> My qiuery is as follows
>>>> create constraint on (ChemicalName:chemicalname) assert 
>>>> ChemicalName.chemicalname is unique;
>>>> create constraint on (ChemicalID:chemicalid) assert ChemicalID.chemicalid 
>>>> is unique;
>>>> create constraint on (Genesymbol:genesymbol) assert Genesymbol.genesymbol 
>>>> is unique;
>>>> create constraint on (Geneid:geneid) assert Geneid.geneid is unique;
>>>> create constraint on (Geneform:geneform) assert Geneform.geneform is 
>>>> unique;
>>>> create constraint on (Interaction:interaction) assert 
>>>> Interaction.interaction is unique;
>>>> create constraint on (Interactionactions:interactionactions) assert 
>>>> Interactionactions.interactionactions is unique;
>>>> create constraint on (PubmedID:pubmed) assert PubmedID.pubmed is unique;
>>>> create index on :ChemicalName(chemicalname);
>>>> create index on :ChemicalID(chemicalid);
>>>> create index on :Genesymbol(genesymbol);
>>>> create index on :Geneid(geneid);
>>>> create index on :Geneform(geneform);
>>>> create index on :Interaction(interaction);
>>>> create index on :Interactionactions(interactionactions);
>>>> create index on :PubmedID(pubmed);
>>>> 
>>>> USING PERIODIC COMMIT 10000
>>>> LOAD CSV WITH HEADERS FROM
>>>> "file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
>>>> AS chemgeneinteractions
>>>> match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
>>>> match (genesymbol:Genesymbol{genesymbol: chemgeneinteractions.GeneSymbol})
>>>> merge (chemicalname:ChemicalID{chemicalid: 
>>>> chemgeneinteractions.ChemicalID, chemicalname: 
>>>> chemgeneinteractions.ChemicalName})
>>>> ON CREATE SET 
>>>> chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
>>>> merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
>>>> merge (interations:Interaction{interact: chemgeneinteractions.Interaction})
>>>> merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
>>>> merge (interaction:Interactionactions{interr: 
>>>> chemgeneinteractions.InteractionActions})
>>>> merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
>>>> merge (geneid)-[:Gene_Symbol]->(genesymbol)
>>>> merge (geneid)-[:chemicalname]->(chemicalname)
>>>> merge (geneid)-[:geneform]->(genefrom)
>>>> merge (geneid)-[:Its_interaction_action]->(interactions)
>>>> merge (geneid)-[:Its_interaction]->(interaction)
>>>> merge (geneid)-[:PubmedID]->(pubmed)
>>>> merge (geneid)-[:Related_To]->(organism)
>>>> merge (genesymbol)-[:chemicalname]->(chemicalname)
>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>> merge (genesymbol)-[:Its_interaction_action]->(interactions)
>>>> merge (genesymbol)-[:Its_interaction]->(interaction)
>>>> merge (genesymbol)-[:PubmedID]->(pubmed)
>>>> merge (genesymbol)-[:Related_To]->(organism)
>>>> 
>>>> 
>>>> 
>>>> My jvm properties are 
>>>> -Xmx512m
>>>> -XX:+UseConcMarkSweepGC
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Wed, Jun 25, 2014 at 4:49 PM, Michael Hunger 
>>>> <[email protected]> wrote:
>>>>> Please read this blog post: 
>>>>> http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
>>>>> 
>>>>> And yes you should use more memory than 512 byte.
>>>>> 
>>>>> -Xmns4G -Xmx4G -Xmn1G 
>>>>> 
>>>>> 
>>>>> On Wed, Jun 25, 2014 at 1:17 PM, Michael Hunger 
>>>>> <[email protected]> wrote:
>>>>>> What does your query look like?
>>>>>> Please switch to Neo4j 2.1.2
>>>>>> 
>>>>>> And create indexes / constraints for the nodes you're inserting with 
>>>>>> merge or looking up via MATCH.
>>>>>> 
>>>>>> 
>>>>>> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]>:
>>>>>> 
>>>>>>> Hi,
>>>>>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am 
>>>>>>> trying to import CSV file which has 30000 records. I am using USING 
>>>>>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives 
>>>>>>> unknown error. I have modified neo4j.properties file as adviced in the 
>>>>>>> blogs. My neo4j.properties now looks like 
>>>>>>> # Default values for the low-level graph engine
>>>>>>> 
>>>>>>> neostore.nodestore.db.mapped_memory=200M
>>>>>>> neostore.relationshipstore.db.mapped_memory=4G
>>>>>>> neostore.propertystore.db.mapped_memory=500M
>>>>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>>>> 
>>>>>>> # Enable this to be able to upgrade a store from an older version
>>>>>>> allow_store_upgrade=true
>>>>>>> 
>>>>>>> # Enable this to specify a parser other than the default one.
>>>>>>> #cypher_parser_version=2.0
>>>>>>> 
>>>>>>> # Keep logical logs, helps debugging but uses more disk space, enabled 
>>>>>>> for
>>>>>>> # legacy reasons To limit space needed to store historical logs use 
>>>>>>> values such
>>>>>>> # as: "7 days" or "100M size" instead of "true"
>>>>>>> keep_logical_logs=true
>>>>>>> 
>>>>>>> # Autoindexing
>>>>>>> 
>>>>>>> # Enable auto-indexing for nodes, default is false
>>>>>>> node_auto_indexing=true
>>>>>>> 
>>>>>>> # The node property keys to be auto-indexed, if enabled
>>>>>>> #node_keys_indexable=name,age
>>>>>>> 
>>>>>>> # Enable auto-indexing for relationships, default is false
>>>>>>> relationship_auto_indexing=true
>>>>>>> 
>>>>>>> # The relationship property keys to be auto-indexed, if enabled
>>>>>>> #relationship_keys_indexable=name,age
>>>>>>> 
>>>>>>> # Setting for Community Edition:
>>>>>>> cache_type=weak
>>>>>>> 
>>>>>>> Still i am facing the same problem. Is there any other file to change 
>>>>>>> properties. Kindly help me in this issue.
>>>>>>> Thanks in advance
>>>>>>> 
>>>>>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I was asked to post this here by Mark Needham (@markhneedham) who 
>>>>>>>> thought my query took longer than it should.
>>>>>>>> 
>>>>>>>> I'm trying to see how graph databases could be used in investigative 
>>>>>>>> journalism: I was loading in New York State's Active Corporations: 
>>>>>>>> Beginning 1800 data from 
>>>>>>>> https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6
>>>>>>>>  as a 1964486-row csv (and deleted all U+F8FF characters, because I 
>>>>>>>> was getting "[null] is not a supported property value"). The Cypher 
>>>>>>>> query I used was 
>>>>>>>> 
>>>>>>>> USING PERIODIC COMMIT 500
>>>>>>>> LOAD CSV
>>>>>>>>   FROM 
>>>>>>>> "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv"
>>>>>>>>   AS company
>>>>>>>> CREATE (:DataActiveCorporations
>>>>>>>>        {
>>>>>>>>                DOS_ID:company[0],
>>>>>>>>                Current_Entity_Name:company[1],
>>>>>>>>                Initial_DOS_Filing_Date:company[2],
>>>>>>>>                County:company[3],
>>>>>>>>                Jurisdiction:company[4],
>>>>>>>>                Entity_Type:company[5],
>>>>>>>> 
>>>>>>>>                DOS_Process_Name:company[6],
>>>>>>>>                DOS_Process_Address_1:company[7],
>>>>>>>>                DOS_Process_Address_2:company[8],
>>>>>>>>                DOS_Process_City:company[9],
>>>>>>>>                DOS_Process_State:company[10],
>>>>>>>>                DOS_Process_Zip:company[11],
>>>>>>>> 
>>>>>>>>                CEO_Name:company[12],
>>>>>>>>                CEO_Address_1:company[13],
>>>>>>>>                CEO_Address_2:company[14],
>>>>>>>>                CEO_City:company[15],
>>>>>>>>                CEO_State:company[16],
>>>>>>>>                CEO_Zip:company[17],
>>>>>>>> 
>>>>>>>>                Registered_Agent_Name:company[18],
>>>>>>>>                Registered_Agent_Address_1:company[19],
>>>>>>>>                Registered_Agent_Address_2:company[20],
>>>>>>>>                Registered_Agent_City:company[21],
>>>>>>>>                Registered_Agent_State:company[22],
>>>>>>>>                Registered_Agent_Zip:company[23],
>>>>>>>> 
>>>>>>>>                Location_Name:company[24],
>>>>>>>>                Location_Address_1:company[25],
>>>>>>>>                Location_Address_2:company[26],
>>>>>>>>                Location_City:company[27],
>>>>>>>>                Location_State:company[28],
>>>>>>>>                Location_Zip:company[29]
>>>>>>>>        }
>>>>>>>> );
>>>>>>>> 
>>>>>>>> Each row is one node so it's as close to the raw data as possible. The 
>>>>>>>> idea is loosely that these nodes will be linked with new nodes 
>>>>>>>> representing people and addresses verified by reporters.
>>>>>>>> 
>>>>>>>> This is what I got:
>>>>>>>> 
>>>>>>>> +-------------------+
>>>>>>>> | No data returned. |
>>>>>>>> +-------------------+
>>>>>>>> Nodes created: 1964486
>>>>>>>> Properties set: 58934580
>>>>>>>> Labels added: 1964486
>>>>>>>> 4550855 ms
>>>>>>>> 
>>>>>>>> Some context information: 
>>>>>>>> Neo4j Milestone Release 2.1.0-M01
>>>>>>>> Windows 7
>>>>>>>> java version "1.7.0_03"
>>>>>>>> 
>>>>>>>> Best,
>>>>>>>> Aram
>>>>>>> 
>>>>>>> 
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>>>> an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>> 
>>>>> -- 
>>>>> You received this message because you are subscribed to a topic in the 
>>>>> Google Groups "Neo4j" group.
>>>>> To unsubscribe from this topic, visit 
>>>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to 
>>>>> [email protected].
>>>>> 
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> Thanks & Regards,
>>>> Pavan Kumar
>>>> Project Engineer
>>>> CDAC -KP
>>>> Ph +91-7676367646
>>>> -- 
>>>> You received this message because you are subscribed to the Google Groups 
>>>> "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>>> email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>> 
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Neo4j" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>> 
>> 
>> 
>> -- 
>> Thanks & Regards,
>> Pavan Kumar
>> Project Engineer
>> CDAC -KP
>> Ph +91-7676367646
> 
> 
> 
> -- 
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV takes over an hour

Reply via email to