Re: [Neo4j] LOAD CSV takes over an hour

david fauth Wed, 18 Jun 2014 07:51:06 -0700

Run the Create Constraint commands then attempt your LOAD CSV command.
 

On Wednesday, June 18, 2014 5:13:42 AM UTC-4, Pavan Kumar wrote:


> Hi,  
> So My cypher will be like 
> ----------------------------------------------------------
>  USING PERIODIC COMMIT 1000
> LOAD CSV WITH HEADERS FROM
> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
> AS csvimport
> create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is 
> unique;
> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET 
> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=csvimport.Uniprot_Title
> create constraint on (genename:Gene_Name) assert genename:Gene_Name is 
> unique;
> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
>  and so on...
>  merge (uniprotid)-[:Genename]->(genename)
> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
> and so on...
> ---------------------------------------------------------
> Is that right...? i tried the same statements in 2.1.2 and i am getting 
> the following errors.
>
> 1. Invalid input 'n': expected 'p/P' (line 5, column 20)
>
> "create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is 
> unique;"
>
> 2. Cannot merge node using null property value for uniprotid
>
> Kindly help
>
>
>
>
> On Wed, Jun 18, 2014 at 1:44 PM, Michael Hunger <
> [email protected] <javascript:>> wrote:
>
>> I don't understand. 
>>
>> Michael
>>
>>  Am 18.06.2014 um 10:11 schrieb Pavan Kumar <[email protected] 
>> <javascript:>>:
>>  
>>  When i use create statements, it is not considering  the empty fileds 
>> from the CSV file. So used Merge command
>>
>>
>> On Wed, Jun 18, 2014 at 1:09 PM, Michael Hunger <
>> [email protected] <javascript:>> wrote:
>>
>>> And create the indexes for all those node + property 
>>>
>>> And for operations like this: 
>>>
>>> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name, 
>>> Uniprot_title: csvimport.Uniprot_Title}
>>>
>>> please use a constraint:
>>>
>>> create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is 
>>> unique;
>>>
>>> and the merge operation like this, so it can actually leverage the 
>>> index/constraint.
>>>
>>>  MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET 
>>> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=csvimport.Uniprot_Title
>>>
>>> ...
>>>
>>>  Am 18.06.2014 um 09:18 schrieb Pavan Kumar <[email protected] 
>>> <javascript:>>:
>>>
>>>   My query looks like following 
>>> USING PERIODIC COMMIT 1000
>>> LOAD CSV WITH HEADERS FROM
>>> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
>>> AS csvimport
>>> merge (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name, 
>>> Uniprot_title: csvimport.Uniprot_Title})
>>> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
>>> merge (Genbank_prtn:GenBank_Protein{GenBank_protein_id: 
>>> csvimport.GenBank_Protein_ID})
>>> merge (Genbank_gene:GenBank_Gene{GenBank_gene_id: 
>>> csvimport.GenBank_Gene_ID})
>>> merge (pdbid:PDBID{PDBid: csvimport.PDB_ID})
>>> merge (geneatlas:Geneatlasid{Geneatlas: csvimport.GenAtlas_ID})
>>> merge (HGNC:HGNCid{hgnc: csvimport.HGNC_ID})
>>> merge (species:Species{Species: csvimport.Species})
>>> merge (genecard:Genecardid{Genecard: csvimport.GeneCard_ID})
>>> merge (drugid:DrugID{DrugID: csvimport.Drug_IDs})
>>> merge (uniprotid)-[:Genename]->(genename)
>>> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
>>> merge (uniprotid)-[:GenBank_GeneID]->(Genbank_gene)
>>> merge (uniprotid)-[:PDBID]->(pdbid)
>>> merge (uniprotid)-[:GeneatlasID]->(geneatlas)
>>> merge (uniprotid)-[:HGNCID]->(HGNC)
>>> merge (uniprotid)-[:Species]->(species)
>>> merge (uniprotid)-[:GenecardID]->(genecard)
>>> merge (uniprotid)-[:DrugID]->(drugid)
>>>
>>> I am attaching sample csv file also. Please find it.
>>> As suggested, I will try with new version of neo4j
>>>
>>>
>>> On Wed, Jun 18, 2014 at 12:41 PM, Michael Hunger <
>>> [email protected] <javascript:>> wrote:
>>>
>>>> What does your query look like? 
>>>> Please switch to Neo4j 2.1.2
>>>>
>>>> And create indexes / constraints for the nodes you're inserting with 
>>>> MERGE or looking up via MATCH.
>>>>
>>>> Michael
>>>>
>>>>  Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected] 
>>>> <javascript:>>:
>>>>
>>>>   Hi, 
>>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am 
>>>> trying to import CSV file which has 30000 records. I am using USING 
>>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives 
>>>> unknown error. I have modified neo4j.properties file as adviced in the 
>>>> blogs. My neo4j.properties now looks like 
>>>>  # Default values for the low-level graph engine
>>>>
>>>> neostore.nodestore.db.mapped_memory=200M
>>>> neostore.relationshipstore.db.mapped_memory=4G
>>>> neostore.propertystore.db.mapped_memory=500M
>>>> neostore.propertystore.db.strings.mapped_memory=500M
>>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>>
>>>> # Enable this to be able to upgrade a store from an older version
>>>> allow_store_upgrade=true
>>>>
>>>> # Enable this to specify a parser other than the default one.
>>>> #cypher_parser_version=2.0
>>>>
>>>> # Keep logical logs, helps debugging but uses more disk space, enabled 
>>>> for
>>>> # legacy reasons To limit space needed to store historical logs use 
>>>> values such
>>>> # as: "7 days" or "100M size" instead of "true"
>>>> keep_logical_logs=true
>>>>
>>>> # Autoindexing
>>>>
>>>> # Enable auto-indexing for nodes, default is false
>>>> node_auto_indexing=true
>>>>
>>>> # The node property keys to be auto-indexed, if enabled
>>>> #node_keys_indexable=name,age
>>>>
>>>> # Enable auto-indexing for relationships, default is false
>>>> relationship_auto_indexing=true
>>>>
>>>> # The relationship property keys to be auto-indexed, if enabled
>>>> #relationship_keys_indexable=name,age
>>>>
>>>> # Setting for Community Edition:
>>>> cache_type=weak
>>>>
>>>> Still i am facing the same problem. Is there any other file to change 
>>>> properties. Kindly help me in this issue.
>>>> Thanks in advance
>>>>
>>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote: 
>>>>>
>>>>>  Hi,
>>>>>
>>>>> I was asked to post this here by Mark Needham (@markhneedham) who 
>>>>> thought my query took longer than it should.
>>>>>
>>>>> I'm trying to see how graph databases could be used in investigative 
>>>>> journalism: I was loading in New York State's Active Corporations: 
>>>>> Beginning 1800 data from https://data.ny.gov/Economic-
>>>>> Development/Active-Corporations-Beginning-1800/n9v6-gdp6 as a 
>>>>> 1964486-row csv (and deleted all U+F8FF characters, because I was getting 
>>>>> "[null] is not a supported property value"). The Cypher query I used was 
>>>>>
>>>>> USING PERIODIC COMMIT 500
>>>>> LOAD CSV
>>>>>   FROM "file://path/to/csv/Active_Corporations___Beginning_1800_
>>>>> _without_header__wonky_characters_fixed.csv"
>>>>>   AS company
>>>>> CREATE (:DataActiveCorporations
>>>>> {
>>>>> DOS_ID:company[0],
>>>>> Current_Entity_Name:company[1],
>>>>> Initial_DOS_Filing_Date:company[2],
>>>>> County:company[3],
>>>>> Jurisdiction:company[4],
>>>>> Entity_Type:company[5],
>>>>>
>>>>> DOS_Process_Name:company[6],
>>>>> DOS_Process_Address_1:company[7],
>>>>> DOS_Process_Address_2:company[8],
>>>>> DOS_Process_City:company[9],
>>>>> DOS_Process_State:company[10],
>>>>> DOS_Process_Zip:company[11],
>>>>>
>>>>> CEO_Name:company[12],
>>>>> CEO_Address_1:company[13],
>>>>> CEO_Address_2:company[14],
>>>>> CEO_City:company[15],
>>>>> CEO_State:company[16],
>>>>> CEO_Zip:company[17],
>>>>>
>>>>> Registered_Agent_Name:company[18],
>>>>> Registered_Agent_Address_1:company[19],
>>>>> Registered_Agent_Address_2:company[20],
>>>>> Registered_Agent_City:company[21],
>>>>> Registered_Agent_State:company[22],
>>>>> Registered_Agent_Zip:company[23],
>>>>>
>>>>> Location_Name:company[24],
>>>>> Location_Address_1:company[25],
>>>>> Location_Address_2:company[26],
>>>>> Location_City:company[27],
>>>>> Location_State:company[28],
>>>>> Location_Zip:company[29]
>>>>> }
>>>>> );
>>>>>
>>>>> Each row is one node so it's as close to the raw data as possible. The 
>>>>> idea is loosely that these nodes will be linked with new nodes 
>>>>> representing 
>>>>> people and addresses verified by reporters.
>>>>>
>>>>> This is what I got:
>>>>>
>>>>> +-------------------+
>>>>> | No data returned. |
>>>>> +-------------------+
>>>>> Nodes created: 1964486
>>>>> Properties set: 58934580
>>>>> Labels added: 1964486
>>>>> 4550855 ms
>>>>>
>>>>> Some context information: 
>>>>> Neo4j Milestone Release 2.1.0-M01
>>>>> Windows 7
>>>>> java version "1.7.0_03"
>>>>>
>>>>> Best,
>>>>> Aram
>>>>>
>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected] <javascript:>. 
>>>>
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>  
>>>> -- 
>>>> You received this message because you are subscribed to a topic in the 
>>>> Google Groups "Neo4j" group.
>>>> To unsubscribe from this topic, visit 
>>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>>> To unsubscribe from this group and all its topics, send an email to 
>>>> [email protected] <javascript:>.
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>
>>>
>>>
>>> -- 
>>> Thanks & Regards,
>>> Pavan Kumar
>>> Project Engineer
>>> CDAC -KP
>>> Ph +91-7676367646 
>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>> <SAmple_Drugbank.xls>
>>>
>>>
>>>  
>>> -- 
>>> You received this message because you are subscribed to a topic in the 
>>> Google Groups "Neo4j" group.
>>> To unsubscribe from this topic, visit 
>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to 
>>> [email protected] <javascript:>.
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> -- 
>> Thanks & Regards,
>> Pavan Kumar
>> Project Engineer
>> CDAC -KP
>> Ph +91-7676367646 
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>  -- 
>> You received this message because you are subscribed to a topic in the 
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit 
>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to 
>> [email protected] <javascript:>.
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> -- 
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646 
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LOAD CSV takes over an hour

Reply via email to