Re: [Neo4j] Re: LOAD CSV takes over an hour

Pavan Kumar Sat, 28 Jun 2014 01:24:02 -0700

I have changed my constraints and i am sure my labels and identifiers are
not same now..
But still query is executing for long time in log file i can see
"Applications threads blocked for" statement..And i am getting the same
error
"GC overhead limit exceeded". Currently i have set to 2 GB in jvm
file..Kindly tell me if i am doing any mistake in my cypher statements.
create constraint on (ChemicalName:chemicalname) assert
ChemicalName.chemicalname is unique;
create constraint on (Chemicalid:chemicalid) assert Chemicalid.chemicalid
is unique;
create constraint on (Genesymb:genesymbol) assert Genesymb.genesymbol is
unique;
create constraint on (GeneID:geneid) assert GeneID.geneid is unique;
create constraint on (form:geneform) assert form.geneform is unique;
create constraint on (Interac:interaction) assert Interac.interaction is
unique;
create constraint on (Interactactions:interactionactions) assert
Interactactions.interactionactions is unique;
create constraint on (Pubmedid:pubmed) assert Pubmedid.pubmed is unique;
create index on :ChemicalName(chemicalname);
create index on :Chemicalid(chemicalid);
create index on :Genesymb(genesymbol);
create index on :GeneID(geneid);
create index on :form(geneform);
create index on :Interac(interaction);
create index on :Interactactions(interactionactions);
create index on :Pubmedid(pubmed);


USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
AS chemgeneinteractions
match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
match (genesymbol:Genesymbol{genesymbol: chemgeneinteractions.GeneSymbol})
merge (chemicalname:ChemicalID{chemicalid: chemgeneinteractions.ChemicalID,
chemicalname: chemgeneinteractions.ChemicalName})
ON CREATE SET
chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
merge (interations:Interaction{interact: chemgeneinteractions.Interaction})
merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
merge (interaction:Interactionactions{interr:
chemgeneinteractions.InteractionActions})
merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
merge (geneid)-[:Gene_Symbol]->(genesymbol)
merge (geneid)-[:chemicalname]->(chemicalname)
merge (geneid)-[:geneform]->(genefrom)
merge (geneid)-[:Its_interaction_action]->(interactions)
merge (geneid)-[:Its_interaction]->(interaction)
merge (geneid)-[:PubmedID]->(pubmed)
merge (geneid)-[:Related_To]->(organism)
merge (genesymbol)-[:chemicalname]->(chemicalname)
merge (genesymbol)-[:geneform]->(genefrom)
merge (genesymbol)-[:geneform]->(genefrom)
merge (genesymbol)-[:Its_interaction_action]->(interactions)
merge (genesymbol)-[:Its_interaction]->(interaction)
merge (genesymbol)-[:PubmedID]->(pubmed)
merge (genesymbol)-[:Related_To]->(organism)

Thanks



On Thu, Jun 26, 2014 at 6:57 PM, Michael Hunger <
[email protected]> wrote:

> Your constraints are wrong you mixed up labels and identifiers
>
> Please also check the index properties
>
> And I had better success doing a multi-pass for each set of elements to
> connect
>
> Sent from mobile device
>
> Am 26.06.2014 um 11:58 schrieb Pavan Kumar <[email protected]>:
>
> My JVM properties include
>
> # Enter one VM parameter per line, note that some parameters can only be
> set once.
> # For example, to adjust the maximum memory usage to 512 MB, uncomment the
> following line
> -Xmx6144m
> Xmx4G -Xms4G -Xmn1G
>
>  but still i am getting GC overhead limit exceeded error. (I have tried
> from 512m to 6GB)
> My neo4j.propertie file contains
> neostore.nodestore.db.mapped_memory=100M
> neostore.relationshipstore.db.mapped_memory=2G
> neostore.propertystore.db.mapped_memory=200M
> neostore.propertystore.db.strings.mapped_memory=200M
> neostore.propertystore.db.arrays.mapped_memory=0M
>
> Any more suggestions to get rid of the error
>
>
>
>
> On Wed, Jun 25, 2014 at 6:34 PM, Pavan Kumar <[email protected]>
> wrote:
>
>> My property file is
>>
>> # Default values for the low-level graph engine
>>
>> neostore.nodestore.db.mapped_memory=100M
>> neostore.relationshipstore.db.mapped_memory=2G
>> neostore.propertystore.db.mapped_memory=200M
>> neostore.propertystore.db.strings.mapped_memory=200M
>> neostore.propertystore.db.arrays.mapped_memory=0M
>>
>> # Enable this to be able to upgrade a store from an older version
>> allow_store_upgrade=true
>>
>> # Enable this to specify a parser other than the default one.
>> #cypher_parser_version=2.0
>>
>> # Keep logical logs, helps debugging but uses more disk space, enabled for
>> # legacy reasons To limit space needed to store historical logs use
>> values such
>> # as: "7 days" or "100M size" instead of "true"
>> keep_logical_logs=true
>>
>> # Autoindexing
>>
>> # Enable auto-indexing for nodes, default is false
>> #node_auto_indexing=true
>>
>> # The node property keys to be auto-indexed, if enabled
>> #node_keys_indexable=name,age
>>
>> # Enable auto-indexing for relationships, default is false
>> #relationship_auto_indexing=true
>>
>> # The relationship property keys to be auto-indexed, if enabled
>> #relationship_keys_indexable=name,age
>> cache_type=strong
>>
>>
>>
>> My jvm options are
>> # Enter one VM parameter per line, note that some parameters can only be
>> set once.
>> # For example, to adjust the maximum memory usage to 512 MB, uncomment
>> the following line
>> -Xmx512m
>>
>> But still i am getting GC overhead limit exceeded error
>> Kindly somebody suggest me
>>
>> On Wed, Jun 25, 2014 at 5:01 PM, Michael Hunger <
>> [email protected]> wrote:
>>
>>> use 1000 here
>>> USING PERIODIC COMMIT 1000
>>>
>>> Increase the memory settings to 6G
>>>
>>> As you run on windows:
>>>
>>> neostore.nodestore.db.mapped_memory=100M
>>> neostore.relationshipstore.db.mapped_memory=2G
>>> neostore.propertystore.db.mapped_memory=200M
>>> neostore.propertystore.db.strings.mapped_memory=200M
>>> neostore.propertystore.db.arrays.mapped_memory=0M
>>>
>>> also can you share your complete CSV with me privately?
>>>
>>> Do you have any nodes in your dataset that have many (100k-1M) of
>>> relationships?
>>>
>>> On Wed, Jun 25, 2014 at 1:23 PM, Pavan Kumar <[email protected]>
>>> wrote:
>>>
>>>> Hi ,
>>>> My qiuery is as follows
>>>> create constraint on (ChemicalName:chemicalname) assert
>>>> ChemicalName.chemicalname is unique;
>>>> create constraint on (ChemicalID:chemicalid) assert
>>>> ChemicalID.chemicalid is unique;
>>>> create constraint on (Genesymbol:genesymbol) assert
>>>> Genesymbol.genesymbol is unique;
>>>> create constraint on (Geneid:geneid) assert Geneid.geneid is unique;
>>>> create constraint on (Geneform:geneform) assert Geneform.geneform is
>>>> unique;
>>>> create constraint on (Interaction:interaction) assert
>>>> Interaction.interaction is unique;
>>>> create constraint on (Interactionactions:interactionactions) assert
>>>> Interactionactions.interactionactions is unique;
>>>> create constraint on (PubmedID:pubmed) assert PubmedID.pubmed is unique;
>>>> create index on :ChemicalName(chemicalname);
>>>> create index on :ChemicalID(chemicalid);
>>>> create index on :Genesymbol(genesymbol);
>>>> create index on :Geneid(geneid);
>>>> create index on :Geneform(geneform);
>>>> create index on :Interaction(interaction);
>>>> create index on :Interactionactions(interactionactions);
>>>> create index on :PubmedID(pubmed);
>>>>
>>>> USING PERIODIC COMMIT 10000
>>>> LOAD CSV WITH HEADERS FROM
>>>> "file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
>>>> AS chemgeneinteractions
>>>> match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
>>>> match (genesymbol:Genesymbol{genesymbol:
>>>> chemgeneinteractions.GeneSymbol})
>>>> merge (chemicalname:ChemicalID{chemicalid:
>>>> chemgeneinteractions.ChemicalID, chemicalname:
>>>> chemgeneinteractions.ChemicalName})
>>>> ON CREATE SET
>>>> chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
>>>> merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
>>>> merge (interations:Interaction{interact:
>>>> chemgeneinteractions.Interaction})
>>>> merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
>>>> merge (interaction:Interactionactions{interr:
>>>> chemgeneinteractions.InteractionActions})
>>>> merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
>>>> merge (geneid)-[:Gene_Symbol]->(genesymbol)
>>>> merge (geneid)-[:chemicalname]->(chemicalname)
>>>> merge (geneid)-[:geneform]->(genefrom)
>>>> merge (geneid)-[:Its_interaction_action]->(interactions)
>>>> merge (geneid)-[:Its_interaction]->(interaction)
>>>> merge (geneid)-[:PubmedID]->(pubmed)
>>>> merge (geneid)-[:Related_To]->(organism)
>>>> merge (genesymbol)-[:chemicalname]->(chemicalname)
>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>> merge (genesymbol)-[:geneform]->(genefrom)
>>>> merge (genesymbol)-[:Its_interaction_action]->(interactions)
>>>> merge (genesymbol)-[:Its_interaction]->(interaction)
>>>> merge (genesymbol)-[:PubmedID]->(pubmed)
>>>> merge (genesymbol)-[:Related_To]->(organism)
>>>>
>>>>
>>>>
>>>> My jvm properties are
>>>> -Xmx512m
>>>> -XX:+UseConcMarkSweepGC
>>>>
>>>>
>>>>
>>>>
>>>> On Wed, Jun 25, 2014 at 4:49 PM, Michael Hunger <
>>>> [email protected]> wrote:
>>>>
>>>>> Please read this blog post:
>>>>> http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
>>>>>
>>>>> And yes you should use more memory than 512 byte.
>>>>>
>>>>> -Xmns4G -Xmx4G -Xmn1G
>>>>>
>>>>>
>>>>> On Wed, Jun 25, 2014 at 1:17 PM, Michael Hunger <
>>>>> [email protected]> wrote:
>>>>>
>>>>>> What does your query look like?
>>>>>> Please switch to Neo4j 2.1.2
>>>>>>
>>>>>> And create indexes / constraints for the nodes you're inserting with
>>>>>> merge or looking up via MATCH.
>>>>>>
>>>>>>
>>>>>> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]
>>>>>> >:
>>>>>>
>>>>>>  Hi,
>>>>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am
>>>>>> trying to import CSV file which has 30000 records. I am using USING
>>>>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives
>>>>>> unknown error. I have modified neo4j.properties file as adviced in the
>>>>>> blogs. My neo4j.properties now looks like
>>>>>> # Default values for the low-level graph engine
>>>>>>
>>>>>> neostore.nodestore.db.mapped_memory=200M
>>>>>> neostore.relationshipstore.db.mapped_memory=4G
>>>>>> neostore.propertystore.db.mapped_memory=500M
>>>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>>>
>>>>>> # Enable this to be able to upgrade a store from an older version
>>>>>> allow_store_upgrade=true
>>>>>>
>>>>>> # Enable this to specify a parser other than the default one.
>>>>>> #cypher_parser_version=2.0
>>>>>>
>>>>>> # Keep logical logs, helps debugging but uses more disk space,
>>>>>> enabled for
>>>>>> # legacy reasons To limit space needed to store historical logs use
>>>>>> values such
>>>>>> # as: "7 days" or "100M size" instead of "true"
>>>>>> keep_logical_logs=true
>>>>>>
>>>>>> # Autoindexing
>>>>>>
>>>>>> # Enable auto-indexing for nodes, default is false
>>>>>> node_auto_indexing=true
>>>>>>
>>>>>> # The node property keys to be auto-indexed, if enabled
>>>>>> #node_keys_indexable=name,age
>>>>>>
>>>>>> # Enable auto-indexing for relationships, default is false
>>>>>> relationship_auto_indexing=true
>>>>>>
>>>>>> # The relationship property keys to be auto-indexed, if enabled
>>>>>> #relationship_keys_indexable=name,age
>>>>>>
>>>>>> # Setting for Community Edition:
>>>>>> cache_type=weak
>>>>>>
>>>>>> Still i am facing the same problem. Is there any other file to change
>>>>>> properties. Kindly help me in this issue.
>>>>>> Thanks in advance
>>>>>>
>>>>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> I was asked to post this here by Mark Needham (@markhneedham) who
>>>>>>> thought my query took longer than it should.
>>>>>>>
>>>>>>> I'm trying to see how graph databases could be used in investigative
>>>>>>> journalism: I was loading in New York State's Active Corporations:
>>>>>>> Beginning 1800 data from https://data.ny.gov/Economic-
>>>>>>> Development/Active-Corporations-Beginning-1800/n9v6-gdp6 as a
>>>>>>> 1964486-row csv (and deleted all U+F8FF characters, because I was 
>>>>>>> getting
>>>>>>> "[null] is not a supported property value"). The Cypher query I used was
>>>>>>>
>>>>>>> USING PERIODIC COMMIT 500
>>>>>>> LOAD CSV
>>>>>>>   FROM "file://path/to/csv/Active_Corporations___Beginning_1800_
>>>>>>> _without_header__wonky_characters_fixed.csv"
>>>>>>>    AS company
>>>>>>> CREATE (:DataActiveCorporations
>>>>>>> {
>>>>>>> DOS_ID:company[0],
>>>>>>> Current_Entity_Name:company[1],
>>>>>>>  Initial_DOS_Filing_Date:company[2],
>>>>>>> County:company[3],
>>>>>>> Jurisdiction:company[4],
>>>>>>>  Entity_Type:company[5],
>>>>>>>
>>>>>>> DOS_Process_Name:company[6],
>>>>>>> DOS_Process_Address_1:company[7],
>>>>>>>  DOS_Process_Address_2:company[8],
>>>>>>> DOS_Process_City:company[9],
>>>>>>> DOS_Process_State:company[10],
>>>>>>>  DOS_Process_Zip:company[11],
>>>>>>>
>>>>>>> CEO_Name:company[12],
>>>>>>> CEO_Address_1:company[13],
>>>>>>>  CEO_Address_2:company[14],
>>>>>>> CEO_City:company[15],
>>>>>>> CEO_State:company[16],
>>>>>>>  CEO_Zip:company[17],
>>>>>>>
>>>>>>> Registered_Agent_Name:company[18],
>>>>>>> Registered_Agent_Address_1:company[19],
>>>>>>>  Registered_Agent_Address_2:company[20],
>>>>>>> Registered_Agent_City:company[21],
>>>>>>> Registered_Agent_State:company[22],
>>>>>>>  Registered_Agent_Zip:company[23],
>>>>>>>
>>>>>>> Location_Name:company[24],
>>>>>>> Location_Address_1:company[25],
>>>>>>>  Location_Address_2:company[26],
>>>>>>> Location_City:company[27],
>>>>>>> Location_State:company[28],
>>>>>>>  Location_Zip:company[29]
>>>>>>> }
>>>>>>> );
>>>>>>>
>>>>>>> Each row is one node so it's as close to the raw data as possible.
>>>>>>> The idea is loosely that these nodes will be linked with new nodes
>>>>>>> representing people and addresses verified by reporters.
>>>>>>>
>>>>>>> This is what I got:
>>>>>>>
>>>>>>> +-------------------+
>>>>>>> | No data returned. |
>>>>>>> +-------------------+
>>>>>>> Nodes created: 1964486
>>>>>>> Properties set: 58934580
>>>>>>> Labels added: 1964486
>>>>>>> 4550855 ms
>>>>>>>
>>>>>>> Some context information:
>>>>>>> Neo4j Milestone Release 2.1.0-M01
>>>>>>> Windows 7
>>>>>>> java version "1.7.0_03"
>>>>>>>
>>>>>>> Best,
>>>>>>> Aram
>>>>>>>
>>>>>>
>>>>>> --
>>>>>> You received this message because you are subscribed to the Google
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>>
>>>>>  --
>>>>> You received this message because you are subscribed to a topic in the
>>>>> Google Groups "Neo4j" group.
>>>>> To unsubscribe from this topic, visit
>>>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>>> To unsubscribe from this group and all its topics, send an email to
>>>>> [email protected].
>>>>>
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Thanks & Regards,
>>>> Pavan Kumar
>>>> Project Engineer
>>>> CDAC -KP
>>>> Ph +91-7676367646
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>  --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Neo4j" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Pavan Kumar
>> Project Engineer
>> CDAC -KP
>> Ph +91-7676367646
>>
>
>
>
> --
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Thanks & Regards,
Pavan Kumar
Project Engineer
CDAC -KP
Ph +91-7676367646

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV takes over an hour

Reply via email to