Re: [Neo4j] LOAD CSV takes over an hour

Pavan Kumar Wed, 18 Jun 2014 02:14:15 -0700

Hi,
So My cypher will be like
----------------------------------------------------------
USING PERIODIC COMMIT 1000
LOAD CSV WITH HEADERS FROM
"file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
AS csvimport
create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is
unique;
MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET
uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=csvimport.Uniprot_Title
create constraint on (genename:Gene_Name) assert genename:Gene_Name is
unique;
merge (genename:Gene_Name{genename: csvimport.Gene_Name})
 and so on...
merge (uniprotid)-[:Genename]->(genename)
merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
and so on...
---------------------------------------------------------
Is that right...? i tried the same statements in 2.1.2 and i am getting the
following errors.


1. Invalid input 'n': expected 'p/P' (line 5, column 20)


"create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid
is unique;"


2. Cannot merge node using null property value for uniprotid


Kindly help




On Wed, Jun 18, 2014 at 1:44 PM, Michael Hunger <
[email protected]> wrote:

> I don't understand.
>
> Michael
>
> Am 18.06.2014 um 10:11 schrieb Pavan Kumar <[email protected]>:
>
> When i use create statements, it is not considering  the empty fileds from
> the CSV file. So used Merge command
>
>
> On Wed, Jun 18, 2014 at 1:09 PM, Michael Hunger <
> [email protected]> wrote:
>
>> And create the indexes for all those node + property
>>
>> And for operations like this:
>>
>> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name,
>> Uniprot_title: csvimport.Uniprot_Title}
>>
>> please use a constraint:
>>
>> create constraint on (uniprotid:Uniprotid) assert uniprotid.uniprotid is
>> unique;
>>
>> and the merge operation like this, so it can actually leverage the
>> index/constraint.
>>
>> MERGE (uniprotid:Uniprotid{uniprotid: csvimport.ID}) ON CREATE SET
>> uniprotid.Name=csvimport.Name,uniprotid.Uniprot_title=csvimport.Uniprot_Title
>>
>> ...
>>
>> Am 18.06.2014 um 09:18 schrieb Pavan Kumar <[email protected]>:
>>
>> My query looks like following
>> USING PERIODIC COMMIT 1000
>> LOAD CSV WITH HEADERS FROM
>> "file:D:/Graph_Database/DrugBank_database/DrugbankFull_Database.csv"
>> AS csvimport
>> merge (uniprotid:Uniprotid{uniprotid: csvimport.ID, Name:csvimport.Name,
>> Uniprot_title: csvimport.Uniprot_Title})
>> merge (genename:Gene_Name{genename: csvimport.Gene_Name})
>> merge (Genbank_prtn:GenBank_Protein{GenBank_protein_id:
>> csvimport.GenBank_Protein_ID})
>> merge (Genbank_gene:GenBank_Gene{GenBank_gene_id:
>> csvimport.GenBank_Gene_ID})
>> merge (pdbid:PDBID{PDBid: csvimport.PDB_ID})
>> merge (geneatlas:Geneatlasid{Geneatlas: csvimport.GenAtlas_ID})
>> merge (HGNC:HGNCid{hgnc: csvimport.HGNC_ID})
>> merge (species:Species{Species: csvimport.Species})
>> merge (genecard:Genecardid{Genecard: csvimport.GeneCard_ID})
>> merge (drugid:DrugID{DrugID: csvimport.Drug_IDs})
>> merge (uniprotid)-[:Genename]->(genename)
>> merge (uniprotid)-[:GenBank_ProteinID]->(Genbank_prtn)
>> merge (uniprotid)-[:GenBank_GeneID]->(Genbank_gene)
>> merge (uniprotid)-[:PDBID]->(pdbid)
>> merge (uniprotid)-[:GeneatlasID]->(geneatlas)
>> merge (uniprotid)-[:HGNCID]->(HGNC)
>> merge (uniprotid)-[:Species]->(species)
>> merge (uniprotid)-[:GenecardID]->(genecard)
>> merge (uniprotid)-[:DrugID]->(drugid)
>>
>> I am attaching sample csv file also. Please find it.
>> As suggested, I will try with new version of neo4j
>>
>>
>> On Wed, Jun 18, 2014 at 12:41 PM, Michael Hunger <
>> [email protected]> wrote:
>>
>>> What does your query look like?
>>> Please switch to Neo4j 2.1.2
>>>
>>> And create indexes / constraints for the nodes you're inserting with
>>> MERGE or looking up via MATCH.
>>>
>>> Michael
>>>
>>> Am 18.06.2014 um 08:46 schrieb Pavan Kumar <[email protected]>:
>>>
>>> Hi,
>>> I have deployed neo4j 2.1.0-M01 on windows which has 8GB RAM. I am
>>> trying to import CSV file which has 30000 records. I am using USING
>>> PERIODIC COMMIT 1000 LOAD CSV command for importing, but it gives
>>> unknown error. I have modified neo4j.properties file as adviced in the
>>> blogs. My neo4j.properties now looks like
>>> # Default values for the low-level graph engine
>>>
>>> neostore.nodestore.db.mapped_memory=200M
>>> neostore.relationshipstore.db.mapped_memory=4G
>>> neostore.propertystore.db.mapped_memory=500M
>>> neostore.propertystore.db.strings.mapped_memory=500M
>>> neostore.propertystore.db.arrays.mapped_memory=500M
>>>
>>> # Enable this to be able to upgrade a store from an older version
>>> allow_store_upgrade=true
>>>
>>> # Enable this to specify a parser other than the default one.
>>> #cypher_parser_version=2.0
>>>
>>> # Keep logical logs, helps debugging but uses more disk space, enabled
>>> for
>>> # legacy reasons To limit space needed to store historical logs use
>>> values such
>>> # as: "7 days" or "100M size" instead of "true"
>>> keep_logical_logs=true
>>>
>>> # Autoindexing
>>>
>>> # Enable auto-indexing for nodes, default is false
>>> node_auto_indexing=true
>>>
>>> # The node property keys to be auto-indexed, if enabled
>>> #node_keys_indexable=name,age
>>>
>>> # Enable auto-indexing for relationships, default is false
>>> relationship_auto_indexing=true
>>>
>>> # The relationship property keys to be auto-indexed, if enabled
>>> #relationship_keys_indexable=name,age
>>>
>>> # Setting for Community Edition:
>>> cache_type=weak
>>>
>>> Still i am facing the same problem. Is there any other file to change
>>> properties. Kindly help me in this issue.
>>> Thanks in advance
>>>
>>> On Tuesday, 4 March 2014 21:24:03 UTC+5:30, Aram Chung wrote:
>>>>
>>>> Hi,
>>>>
>>>> I was asked to post this here by Mark Needham (@markhneedham) who
>>>> thought my query took longer than it should.
>>>>
>>>> I'm trying to see how graph databases could be used in investigative
>>>> journalism: I was loading in New York State's Active Corporations:
>>>> Beginning 1800 data from https://data.ny.gov/Economic-
>>>> Development/Active-Corporations-Beginning-1800/n9v6-gdp6 as a
>>>> 1964486-row csv (and deleted all U+F8FF characters, because I was getting
>>>> "[null] is not a supported property value"). The Cypher query I used was
>>>>
>>>> USING PERIODIC COMMIT 500
>>>> LOAD CSV
>>>>   FROM "file://path/to/csv/Active_Corporations___Beginning_1800_
>>>> _without_header__wonky_characters_fixed.csv"
>>>>   AS company
>>>> CREATE (:DataActiveCorporations
>>>> {
>>>> DOS_ID:company[0],
>>>> Current_Entity_Name:company[1],
>>>>  Initial_DOS_Filing_Date:company[2],
>>>> County:company[3],
>>>> Jurisdiction:company[4],
>>>>  Entity_Type:company[5],
>>>>
>>>> DOS_Process_Name:company[6],
>>>> DOS_Process_Address_1:company[7],
>>>>  DOS_Process_Address_2:company[8],
>>>> DOS_Process_City:company[9],
>>>> DOS_Process_State:company[10],
>>>>  DOS_Process_Zip:company[11],
>>>>
>>>> CEO_Name:company[12],
>>>> CEO_Address_1:company[13],
>>>>  CEO_Address_2:company[14],
>>>> CEO_City:company[15],
>>>> CEO_State:company[16],
>>>>  CEO_Zip:company[17],
>>>>
>>>> Registered_Agent_Name:company[18],
>>>> Registered_Agent_Address_1:company[19],
>>>>  Registered_Agent_Address_2:company[20],
>>>> Registered_Agent_City:company[21],
>>>> Registered_Agent_State:company[22],
>>>>  Registered_Agent_Zip:company[23],
>>>>
>>>> Location_Name:company[24],
>>>> Location_Address_1:company[25],
>>>>  Location_Address_2:company[26],
>>>> Location_City:company[27],
>>>> Location_State:company[28],
>>>>  Location_Zip:company[29]
>>>> }
>>>> );
>>>>
>>>> Each row is one node so it's as close to the raw data as possible. The
>>>> idea is loosely that these nodes will be linked with new nodes representing
>>>> people and addresses verified by reporters.
>>>>
>>>> This is what I got:
>>>>
>>>> +-------------------+
>>>> | No data returned. |
>>>> +-------------------+
>>>> Nodes created: 1964486
>>>> Properties set: 58934580
>>>> Labels added: 1964486
>>>> 4550855 ms
>>>>
>>>> Some context information:
>>>> Neo4j Milestone Release 2.1.0-M01
>>>> Windows 7
>>>> java version "1.7.0_03"
>>>>
>>>> Best,
>>>> Aram
>>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>>
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the
>>> Google Groups "Neo4j" group.
>>> To unsubscribe from this topic, visit
>>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to
>>> [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>
>>
>>
>> --
>> Thanks & Regards,
>> Pavan Kumar
>> Project Engineer
>> CDAC -KP
>> Ph +91-7676367646
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>> <SAmple_Drugbank.xls>
>>
>>
>>
>> --
>> You received this message because you are subscribed to a topic in the
>> Google Groups "Neo4j" group.
>> To unsubscribe from this topic, visit
>> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
>> To unsubscribe from this group and all its topics, send an email to
>> [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>
>
>
> --
> Thanks & Regards,
> Pavan Kumar
> Project Engineer
> CDAC -KP
> Ph +91-7676367646
>
> --
> You received this message because you are subscribed to the Google Groups
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/d/optout.
>
>
>  --
> You received this message because you are subscribed to a topic in the
> Google Groups "Neo4j" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/neo4j/a2DdoKkbyYo/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [email protected].
> For more options, visit https://groups.google.com/d/optout.
>



-- 
Thanks & Regards,
Pavan Kumar
Project Engineer
CDAC -KP
Ph +91-7676367646

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LOAD CSV takes over an hour

Reply via email to