Re: [Neo4j] Re: LOAD CSV takes over an hour

Michael Hunger Wed, 25 Jun 2014 04:31:51 -0700

use 1000 here
USING PERIODIC COMMIT 1000

Increase the memory settings to 6G


As you run on windows:

neostore.nodestore.db.mapped_memory=100M
neostore.relationshipstore.db.mapped_memory=2G
neostore.propertystore.db.mapped_memory=200M
neostore.propertystore.db.strings.mapped_memory=200M
neostore.propertystore.db.arrays.mapped_memory=0M

also can you share your complete CSV with me privately?

Do you have any nodes in your dataset that have many (100k-1M) of
relationships?

On Wed, Jun 25, 2014 at 1:23 PM, Pavan Kumar <[email protected]>
wrote:

> Hi ,
> My qiuery is as follows
> create constraint on (ChemicalName:chemicalname) assert
> ChemicalName.chemicalname is unique;
> create constraint on (ChemicalID:chemicalid) assert ChemicalID.chemicalid
> is unique;
> create constraint on (Genesymbol:genesymbol) assert Genesymbol.genesymbol
> is unique;
> create constraint on (Geneid:geneid) assert Geneid.geneid is unique;
> create constraint on (Geneform:geneform) assert Geneform.geneform is
> unique;
> create constraint on (Interaction:interaction) assert
> Interaction.interaction is unique;
> create constraint on (Interactionactions:interactionactions) assert
> Interactionactions.interactionactions is unique;
> create constraint on (PubmedID:pubmed) assert PubmedID.pubmed is unique;
> create index on :ChemicalName(chemicalname);
> create index on :ChemicalID(chemicalid);
> create index on :Genesymbol(genesymbol);
> create index on :Geneid(geneid);
> create index on :Geneform(geneform);
> create index on :Interaction(interaction);
> create index on :Interactionactions(interactionactions);
> create index on :PubmedID(pubmed);
>
> USING PERIODIC COMMIT 10000
> LOAD CSV WITH HEADERS FROM
> "file:D:/Graph_Database/CTD/CTD_chem_gene_ixns.csv"
> AS chemgeneinteractions
> match (geneid:Geneid{geneid: chemgeneinteractions.GeneID})
> match (genesymbol:Genesymbol{genesymbol: chemgeneinteractions.GeneSymbol})
> merge (chemicalname:ChemicalID{chemicalid:
> chemgeneinteractions.ChemicalID, chemicalname:
> chemgeneinteractions.ChemicalName})
> ON CREATE SET
> chemicalname.chemicalid=chemgeneinteractions.ChemicalID,chemicalname.chemicalid=chemgeneinteractions.ChemicalName
> merge (geneform:Geneform {geneform: chemgeneinteractions.GeneForms})
> merge (interations:Interaction{interact: chemgeneinteractions.Interaction})
> merge (oraganism:Organism{organism: chemgeneinteractions.Organism})
> merge (interaction:Interactionactions{interr:
> chemgeneinteractions.InteractionActions})
> merge (pubmed:PubmedID{pub: chemgeneinteractions.PubMedIDs})
> merge (geneid)-[:Gene_Symbol]->(genesymbol)
> merge (geneid)-[:chemicalname]->(chemicalname)
> merge (geneid)-[:geneform]->(genefrom)
> merge (geneid)-[:Its_interaction_action]->(interactions)
> merge (geneid)-[:Its_interaction]->(interaction)
> merge (geneid)-[:PubmedID]->(pubmed)
> merge (geneid)-[:Related_To]->(organism)
> merge (genesymbol)-[:chemicalname]->(chemicalname)
> merge (genesymbol)-[:geneform]->(genefrom)
> merge (genesymbol)-[:geneform]->(genefrom)
> merge (genesymbol)-[:Its_interaction_action]->(interactions)
> merge (genesymbol)-[:Its_interaction]->(interaction)
> merge (genesymbol)-[:PubmedID]->(pubmed)
> merge (genesymbol)-[:Related_To]->(organism)
>
>
>
> My jvm properties are
> -Xmx512m
> -XX:+UseConcMarkSweepGC
>
>
>
>
> On Wed, Jun 25, 2014 at 4:49 PM, Michael Hunger <
> [email protected]> wrote:
>
>> Please read this blog post:
>> http://jexp.de/blog/2014/06/load-csv-into-neo4j-quickly-and-successfully/
>>
>> And yes you should use more memory than 512 byte.
>>
>> -Xmns4G -Xmx4G -Xmn1G
>>
>>
>> On Wed, Jun 25, 2014 at 1:17 PM, Michael Hunger <
>> [email protected]> wrote:
>>
>>> What does your query look like?
>>> Please switch to Neo4j 2.1.2
>>>
>>> And create indexes / constraints for the nodes you're inserting with
>>> merge or looking up via MATCH.
>>>
>>>
>>> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]>:
>>>
>>>  Hi,
>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am
>>> trying to import CSV file which has 30000 records. I am using USING
>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives
>>> unknown error. I have modified neo4j.properties file as adviced in the
>>> blogs. My neo4j.properties now looks like
>>> # Default values for the low-level graph engine
>>>
>>> neostore.nodestore.db.mapped_memory=200M
>>> neostore.relationshipstore.db.mapped_memory=4G
>>> neostore.propertystore.db.mapped_memory=500M
>>> neostore.propertystore.db.strings.mapped_memory=500M
>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>
>>> # Enable this to be able to upgrade a store from an older version
>>> allow_store_upgrade=true
>>>
>>> # Enable this to specify a parser other than the default one.
>>> #cypher_parser_version=2.0
>>>
>>> # Keep logical logs, helps debugging but uses more disk space, enabled
>>> for
>>> # legacy reasons To limit space needed to store historical logs use
>>> values such
>>> # as: "7 days" or "100M size" instead of "true"
>>> keep_logical_logs=true
>>>
>>> # Autoindexing
>>>
>>> # Enable auto-indexing for nodes, default is false
>>> node_auto_indexing=true
>>>
>>> # The node property keys to be auto-indexed, if enabled
>>> #node_keys_indexable=name,age
>>>
>>> # Enable auto-indexing for relationships, default is false
>>> relationship_auto_indexing=true
>>>
>>> # The relationship property keys to be auto-indexed, if enabled
>>> #relationship_keys_indexable=name,age
>>>
>>> # Setting for Community Edition:
>>> cache_type=weak
>>>
>>> Still i am facing the same problem. Is there any other file to change
>>> properties. Kindly help me in this issue.
>>> Thanks in advance
>>>
>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was asked to post this here by Mark Needham (@markhneedham) who
>>>> thought my query took longer than it should.
>>>>
>>>> I'm trying to see how graph databases could be used in investigative
>>>> journalism: I was loading in New York State's Active Corporations:
>>>> Beginning 1800 data from https://data.ny.gov/Economic-
>>>> Development/Active-Corporations-Beginning-1800/n9v6-gdp6 as a
>>>> 1964486-row csv (and deleted all U+F8FF characters, because I was getting
>>>> "[null] is not a supported property value"). The Cypher query I used was
>>>>
>>>> USING PERIODIC COMMIT 500
>>>> LOAD CSV
>>>>   FROM "file://path/to/csv/Active_Corporations___Beginning_1800_
>>>> _without_header__wonky_characters_fixed.csv"
>>>>    AS company
>>>> CREATE (:DataActiveCorporations
>>>> {
>>>> DOS_ID:company[0],
>>>> Current_Entity_Name:company[1],
>>>>  Initial_DOS_Filing_Date:company[2],
>>>> County:company[3],
>>>> Jurisdiction:company[4],
>>>>  Entity_Type:company[5],
>>>>
>>>> DOS_Process_Name:company[6],
>>>> DOS_Process_Address_1:company[7],
>>>>  DOS_Process_Address_2:company[8],
>>>> DOS_Process_City:company[9],
>>>> DOS_Process_State:company[10],
>>>>  DOS_Process_Zip:company[11],
>>>>
>>>> CEO_Name:company[12],
>>>> CEO_Address_1:company[13],
>>>>  CEO_Address_2:company[14],
>>>> CEO_City:company[15],
>>>> CEO_State:company[16],
>>>>  CEO_Zip:company[17],
>>>>
>>>> Registered_Agent_Name:company[18],
>>>> Registered_Agent_Address_1:company[19],
>>>>  Registered_Agent_Address_2:company[20],
>>>> Registered_Agent_City:company[21],
>>>> Registered_Agent_State:company[22],
>>>>  Registered_Agent_Zip:company[23],
>>>>
>>>> Location_Name:company[24],
>>>> Location_Address_1:company[25],
>>>>  Location_Address_2:company[26],
>>>> Location_City:company[27],
>>>> Location_State:company[28],
>>>>  Location_Zip:company[29]
>>>> }
>>>> );
>>>>
>>>> Each row is one node so it's as close to the raw data as possible. The
>>>> idea is loosely that these nodes will be linked with new nodes representing
>>>> people and addresses verified by reporters.
>>>>
>>>> This is what I got:
>>>>
>>>> +-------------------+
>>>> | No data returned. |
>>>> +-------------------+
>>>> Nodes created: 1964486
>>>> Properties set: 58934580
>>>> Labels added: 1964486
>>>> 4550855 ms
>>>>
>>>> Some context information:
>>>> Neo4j Milestone Release 2.1.0-M01
>>>> Windows 7
>>>> java version "1.7.0_03"
>>>>
>>>> Best,
>>>> Aram
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>  --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>>
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV takes over an hour

Reply via email to