Re: [Neo4j] Re: LOAD CSV takes over an hour

Michael Hunger Wed, 18 Jun 2014 00:40:25 -0700

And create the indexes for all those node + property

And for operations like this:


MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name, 
Uniprot_title: csvimport.Uniprot_Title}

please use a constraint:

create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is unique;

and the merge operation like this, so it can actually leverage the 
index/constraint.

> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET 
> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=csvimport.Uniprot_Title
...

Am 18.06.2014 um 09:18 schrieb Pavan Kumar <[email protected]>:

> My query looks like following
> USING PERIODIC COMMIT 1000
> LOAD CSV WITH HEADERS FROM
> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
> AS csvimport
> merge (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name, 
> Uniprot_title: csvimport.Uniprot_Title})
> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
> merge (Genbank_prtn:GenBank_Protein{GenBank_protein_id: 
> csvimport.GenBank_Protein_ID})
> merge (Genbank_gene:GenBank_Gene{GenBank_gene_id: csvimport.GenBank_Gene_ID})
> merge (pdbid:PDBID{PDBid: csvimport.PDB_ID})
> merge (geneatlas:Geneatlasid{Geneatlas: csvimport.GenAtlas_ID})
> merge (HGNC:HGNCid{hgnc: csvimport.HGNC_ID})
> merge (species:Species{Species: csvimport.Species})
> merge (genecard:Genecardid{Genecard: csvimport.GeneCard_ID})
> merge (drugid:DrugID{DrugID: csvimport.Drug_IDs})
> merge (uniprotid)-[:Genename]->(genename)
> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
> merge (uniprotid)-[:GenBank_GeneID]->(Genbank_gene)
> merge (uniprotid)-[:PDBID]->(pdbid)
> merge (uniprotid)-[:GeneatlasID]->(geneatlas)
> merge (uniprotid)-[:HGNCID]->(HGNC)
> merge (uniprotid)-[:Species]->(species)
> merge (uniprotid)-[:GenecardID]->(genecard)
> merge (uniprotid)-[:DrugID]->(drugid)
> 
> I am attaching sample csv file also. Please find it.
> As suggested, I will try with new version of neo4j
> 
> 
> On Wed, Jun 18, 2014 at 12:41 PM, Michael Hunger 
> <[email protected]> wrote:
> What does your query look like?
> Please switch to Neo4j 2.1.2
> 
> And create indexes / constraints for the nodes you're inserting with MERGE or 
> looking up via MATCH.
> 
> Michael
> 
> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]>:
> 
>> Hi,
>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am trying to 
>> import CSV file which has 30000 records. I am using USING PERIODIC COMMIT 
>> 1000 LOAD CSV command for importing, but it gives unknown error. I have 
>> modified neo4j.properties file as adviced in the blogs. My neo4j.properties 
>> now looks like 
>> # Default values for the low-level graph engine
>> 
>> neostore.nodestore.db.mapped_memory=200M
>> neostore.relationshipstore.db.mapped_memory=4G
>> neostore.propertystore.db.mapped_memory=500M
>> neostore.propertystore.db.strings.mapped_memory=500M
>> neostore.propertystore.db.arrays.mapped_memory=500M
>> 
>> # Enable this to be able to upgrade a store from an older version
>> allow_store_upgrade=true
>> 
>> # Enable this to specify a parser other than the default one.
>> #cypher_parser_version=2.0
>> 
>> # Keep logical logs, helps debugging but uses more disk space, enabled for
>> # legacy reasons To limit space needed to store historical logs use values 
>> such
>> # as: "7 days" or "100M size" instead of "true"
>> keep_logical_logs=true
>> 
>> # Autoindexing
>> 
>> # Enable auto-indexing for nodes, default is false
>> node_auto_indexing=true
>> 
>> # The node property keys to be auto-indexed, if enabled
>> #node_keys_indexable=name,age
>> 
>> # Enable auto-indexing for relationships, default is false
>> relationship_auto_indexing=true
>> 
>> # The relationship property keys to be auto-indexed, if enabled
>> #relationship_keys_indexable=name,age
>> 
>> # Setting for Community Edition:
>> cache_type=weak
>> 
>> Still i am facing the same problem. Is there any other file to change 
>> properties. Kindly help me in this issue.
>> Thanks in advance
>> 
>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>> Hi,
>> 
>> I was asked to post this here by Mark Needham (@markhneedham) who thought my 
>> query took longer than it should.
>> 
>> I'm trying to see how graph databases could be used in investigative 
>> journalism: I was loading in New York State's Active Corporations: Beginning 
>> 1800 data from 
>> https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6
>>  as a 1964486-row csv (and deleted all U+F8FF characters, because I was 
>> getting "[null] is not a supported property value"). The Cypher query I used 
>> was 
>> 
>> USING PERIODIC COMMIT 500
>> LOAD CSV
>>   FROM 
>> "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv"
>>   AS company
>> CREATE (:DataActiveCorporations
>>      {
>>              DOS_ID:company[0],
>>              Current_Entity_Name:company[1],
>>              Initial_DOS_Filing_Date:company[2],
>>              County:company[3],
>>              Jurisdiction:company[4],
>>              Entity_Type:company[5],
>> 
>>              DOS_Process_Name:company[6],
>>              DOS_Process_Address_1:company[7],
>>              DOS_Process_Address_2:company[8],
>>              DOS_Process_City:company[9],
>>              DOS_Process_State:company[10],
>>              DOS_Process_Zip:company[11],
>> 
>>              CEO_Name:company[12],
>>              CEO_Address_1:company[13],
>>              CEO_Address_2:company[14],
>>              CEO_City:company[15],
>>              CEO_State:company[16],
>>              CEO_Zip:company[17],
>> 
>>              Registered_Agent_Name:company[18],
>>              Registered_Agent_Address_1:company[19],
>>              Registered_Agent_Address_2:company[20],
>>              Registered_Agent_City:company[21],
>>              Registered_Agent_State:company[22],
>>              Registered_Agent_Zip:company[23],
>> 
>>              Location_Name:company[24],
>>              Location_Address_1:company[25],
>>              Location_Address_2:company[26],
>>              Location_City:company[27],
>>              Location_State:company[28],
>>              Location_Zip:company[29]
>>      }
>> );
>> 
>> Each row is one node so it's as close to the raw data as possible. The idea 
>> is loosely that these nodes will be linked with new nodes representing 
>> people and addresses verified by reporters.
>> 
>> This is what I got:
>> 
>> +-------------------+
>> | No data returned. |
>> +-------------------+
>> Nodes created: 1964486
>> Properties set: 58934580
>> Labels added: 1964486
>> 4550855 ms
>> 
>> Some context information: 
>> Neo4j Milestone Release 2.1.0-M01
>> Windows 7
>> java version "1.7.0_03"
>> 
>> Best,
>> Aram
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> 
>> For more options, visit https://groups.google.com/d/optout.
> 
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Neo4j" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
> 
> 
> 
> -- 
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
> <SAmple_Drugbank.xls>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: LOAD CSV takes over an hour

Reply via email to