Re: [Neo4j] LOAD CSV creates nodes but does not set properties

Paul Damian Mon, 23 Jun 2014 02:03:08 -0700

Hey, 
I'm trying to run a command to find out 10 clients and the companies they 
work for. I've used a query like this:
match (c: Client)-[WORKS_FOR]->(co: Company)  return c, co limit 10
However, it keeps returning Java heap space error. Neo4j is installed on a 
vm with windows server 2012R2 Intel Xeon @ 2.27 GHz and 8 GB of RAM. The 
graph db has over 30 GB (which is also weird since the SQL database that 
was used to populate the graph only has 13 GB). What can I do to improve 
the query performance beside adding indexes?




miercuri, 18 iunie 2014, 16:34:10 UTC+3, Michael Hunger a scris:
>
> For me it sounds as if there is a big cross product happening.
>
> I.e. many Verticals with the same Id
>
> What happens if you do:
>
> MATCH (v:Vertical)
> RETURN v.Id, count(*) 
>
> Michael
>
> Am 18.06.2014 um 15:26 schrieb Paul Damian <[email protected] 
> <javascript:>>:
>
> Hi,
>
> I've tried with another file, which contains ClientdId and VerticalId. The 
> thing is, there are only 7 verticals and 11M clients, so there is an 
> obvious one-to-many relationship there.
> When I run 
> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Vertical.csv" AS c
> WITH c LIMIT 100
> MATCH (cli: Client { Id: toInt(c.ClientId)}), (vert: Vertical { Id: 
> toInt(c.VerticalId)})
> Return count(*)
> it return Neo.DatabaseError.Statement.ExecutionFailure 
> I get the same result when I only match the verticals. 
> However, if I run 
> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Vertical.csv" AS c
> WITH c LIMIT 100
> MATCH (cli: Client { Id: toInt(c.ClientId)})
> Return count(*)
>  it returns 100.
> I think it has something to do with the fact that the first 100 verticals 
> have the same Id
>
> miercuri, 18 iunie 2014, 14:20:57 UTC+3, Michael Hunger a scris:
>>
>> sorry
>>
>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" 
>> AS c
>> WITH c
>> LIMIT 100
>> MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>> toInt(c.CityId)})
>> Return count(*)
>>
>>
>> Am 18.06.2014 um 11:44 schrieb Paul Damian <[email protected]>:
>>
>> I cannot run this command. It returns invalid syntax.  Only way I could 
>> run it was 
>>
>>  LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" 
>> AS c
>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>> toInt(c.CityId)})
>> Return count(*) Limit 100
>>
>> Also, I think a skype call would be great.
>>
>> marți, 17 iunie 2014, 21:36:05 UTC+3, Michael Hunger a scris:
>>>
>>> The something is really wrong.
>>>
>>> What happens if you do
>>>
>>>  
>>>>>>>  LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>
>>>>>>> Limit 100
>>>
>>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>>>>>>> toInt(c.CityId)})
>>>>>>>
>>>>>>> Return count(*)
>>>
>>> I'm at a conference in Amsterdam this week
>>> but perhaps we can do a skype call next week?
>>>
>>> Michael
>>>
>>>
>>>
>>> Sent from mobile device
>>>
>>> Am 17.06.2014 um 18:48 schrieb Paul Damian <[email protected]>:
>>>
>>> Yes, I do. I keep getting Java heap space error now. I'm using 100 
>>> commit size.
>>>
>>> marți, 17 iunie 2014, 19:28:05 UTC+3, Michael Hunger a scris:
>>>>
>>>> Ok, cool and you have the indexes for both :City(Id) and :Client(Id) ?
>>>>
>>>>
>>>> Michael
>>>>
>>>> Am 17.06.2014 um 18:15 schrieb Paul Damian <[email protected]>:
>>>>
>>>> The first query returns 999996 which is the number of rows in the file 
>>>> and the second one returns Neo.DatabaseError.Statement.ExecutionFailure
>>>>  probably because of the null values. But then I run the following 
>>>> command:
>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/LOCATED_IN.csv" 
>>>> AS c
>>>>  MATCH (city:City { Id: toInt(c.CityId)})
>>>> WHERE coalesce(c.CityId,"") <> ""
>>>> RETURN count(*)
>>>>
>>>> and I get 992980
>>>>
>>>>
>>>> marți, 17 iunie 2014, 17:55:56 UTC+3, Michael Hunger a scris:
>>>>
>>>>> No you can just filter out the lines with no cityid
>>>>>
>>>>> Did you run my suggested commands?
>>>>>
>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)})
>>>>>>>
>>>>>>> RETURN count(*)
>>>>>>>
>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>  MATCH (city: City { Id: toInt(c.CityId)})
>>>>>>>
>>>>>>> RETURN count(*)
>>>>>>>
>>>>>>
>>>>>>>
>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>
>>>>>>> return c
>>>>> limit 10
>>>>>
>>>>>
>>>>>>> Am 17.06.2014 um 16:37 schrieb Paul Damian <[email protected]>:
>>>>>
>>>>> in the file I only have 2 columns, one for client id, which is always 
>>>>> not null and CityId, which may be sometimes null. Should I export the 
>>>>> records from SQL database leaving out the columns that contain null 
>>>>> values?
>>>>>
>>>>> marți, 17 iunie 2014, 15:39:14 UTC+3, Michael Hunger a scris:
>>>>>>
>>>>>> if they don't have a value for city id, do they then have empty 
>>>>>> columns there still? like "user-id,,
>>>>>>
>>>>>> You probably want to filter these rows?
>>>>>>
>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>
>>>>>>> WHERE coalesce(c.CitiId,"") <> ""
>>>>>> ...
>>>>>>
>>>>>> Am 17.06.2014 um 11:23 schrieb Paul Damian <[email protected]>:
>>>>>>
>>>>>> Well, the csv file contains some rows that do not have a value for 
>>>>>> CityId, and the rows are unique regarding the clientID. There are 11M 
>>>>>> clients living in 14K Cities. Is there a limit of links/node?
>>>>>> Now I've created a piece of code that reads from file and creates 
>>>>>> each relationship, but, as you can imagine, it works really slow in this 
>>>>>> scenario.
>>>>>>  
>>>>>>
>>>>>>> did you create an index on :Client(Id) and :City(Id)
>>>>>>>
>>>>>>> what happens if you do:
>>>>>>>
>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)})
>>>>>>>
>>>>>>> RETURN count(*)
>>>>>>>
>>>>>>> LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>  MATCH (city: City { Id: toInt(c.CityId)})
>>>>>>>
>>>>>>> RETURN count(*)
>>>>>>>
>>>>>>> each count should be equivalent to the # of rows in the file.
>>>>>>>
>>>>>>> Michael
>>>>>>>
>>>>>>> Am 16.06.2014 um 17:47 schrieb Paul Damian <[email protected]>:
>>>>>>>
>>>>>>> Somehow I've managed to load all the nodes and now I'm trying to 
>>>>>>> load the links as well. I read the nodes from csv file and create the 
>>>>>>> relation between them. I run the following command:
>>>>>>> USING PERIODIC COMMIT 100 
>>>>>>>  LOAD CSV WITH HEADERS FROM 
>>>>>>> "file:/Users/pauld/Documents/LOCATED_IN.csv" AS c
>>>>>>>  MATCH (client: Client { Id: toInt(c.Id)}), (city: City { Id: 
>>>>>>> toInt(c.CityId)})
>>>>>>>  CREATE (client)-[r:LOCATED_IN]->(city)
>>>>>>>
>>>>>>> Running with a smaller commit size returns this error 
>>>>>>> Neo.DatabaseError.Statement.ExecutionFailure, while increasing the 
>>>>>>> commit size to 10000 throws 
>>>>>>> Neo.DatabaseError.General.UnknownFailure. 
>>>>>>> Can you help me with this?
>>>>>>>
>>>>>>>
>>>>>>> joi, 5 iunie 2014, 12:05:18 UTC+3, Michael Hunger a scris:
>>>>>>>>
>>>>>>>> Perhaps something with field or line terminators?
>>>>>>>>
>>>>>>>> I assume it blows up the field separation.
>>>>>>>>
>>>>>>>> Try to run:
>>>>>>>>
>>>>>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Client.csv" 
>>>>>>>> AS c
>>>>>>>> RETURN { Id: toInt(c.Id), FirstName: c.FirstName, LastName: 
>>>>>>>> c.Lastname, Address: c.Address, ZipCode: toInt(c.ZipCode), Email: 
>>>>>>>> c.Email, 
>>>>>>>> Phone: c.Phone, Fax: c.Fax, BusinessName: c.BusinessName, URL: c.URL, 
>>>>>>>> Latitude: toFloat(c.Latitude), Longitude: toFloat(c.Longitude), 
>>>>>>>> AgencyId: 
>>>>>>>> toInt(c.AgencyId), RowStatus: toInt(c.RowStatus)} as data, c as line
>>>>>>>> LIMIT 3
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Jun 5, 2014 at 10:51 AM, Paul Damian <[email protected]> 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I've tried using the shell and I get the same results: nodes with 
>>>>>>>>> no properties.
>>>>>>>>> I've created the csv file using MsSQL Server Export. Is it 
>>>>>>>>> relevant?
>>>>>>>>>
>>>>>>>>> About you curiosity: I figured I would import first the nodes, 
>>>>>>>>> then the relationships from the connection tables. Am I doing it 
>>>>>>>>> wrong?
>>>>>>>>>
>>>>>>>>> Thanks
>>>>>>>>>
>>>>>>>>> joi, 5 iunie 2014, 09:54:31 UTC+3, Michael Hunger a scris:
>>>>>>>>>>
>>>>>>>>>> I'd probably use a commit size in your case of 50k or 100k.
>>>>>>>>>>
>>>>>>>>>> Try to use the neo4j-shell and not the web-interface.
>>>>>>>>>>
>>>>>>>>>> Connect to neo4j using bin/neo4j-shell
>>>>>>>>>>
>>>>>>>>>> Then run your commands ending with a semicolon.
>>>>>>>>>>
>>>>>>>>>> Just curious: Your data is imported as one node per row? That's 
>>>>>>>>>> not really a graph structure.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Jun 4, 2014 at 6:56 PM, Paul Damian <[email protected]>
>>>>>>>>>>  wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi there,
>>>>>>>>>>>
>>>>>>>>>>> I'm experimenting with Neo4j while benchmarking a bunch of NoSQL 
>>>>>>>>>>> databases for my graduation paper. 
>>>>>>>>>>> I'm using the web interface to populate the database. I've been 
>>>>>>>>>>> able to load the smaller tables from my SQL database and LOAD CSV 
>>>>>>>>>>> works 
>>>>>>>>>>> fine.
>>>>>>>>>>> By small, I mean a few columns (4-5) and some rows (1 million). 
>>>>>>>>>>> However, when I try to upload a larger table (15 columns, 12 
>>>>>>>>>>> million rows), 
>>>>>>>>>>> it creates the nodes but it doesn't set any properties.
>>>>>>>>>>> I've tried to reduce the number of records (to 100) and also the 
>>>>>>>>>>> number of columns( just the Id property ), but no luck so far.
>>>>>>>>>>>
>>>>>>>>>>> The cypher command used is this one
>>>>>>>>>>> USING PERIODIC COMMIT 100
>>>>>>>>>>> LOAD CSV WITH HEADERS FROM "file:/Users/pauld/Documents/Client.csv" 
>>>>>>>>>>> AS c
>>>>>>>>>>> CREATE (:Client { Id: toInt(c.Id), FirstName: c.FirstName, 
>>>>>>>>>>> LastName: c.Lastname, Address: c.Address, ZipCode: 
>>>>>>>>>>> toInt(c.ZipCode), Email: 
>>>>>>>>>>> c.Email, Phone: c.Phone, Fax: c.Fax, BusinessName: c.BusinessName, 
>>>>>>>>>>> URL: 
>>>>>>>>>>> c.URL, Latitude: toFloat(c.Latitude), Longitude: 
>>>>>>>>>>> toFloat(c.Longitude), 
>>>>>>>>>>> AgencyId: toInt(c.AgencyId), RowStatus: toInt(c.RowStatus)})
>>>>>>>>>>>
>>>>>>>>>>> Any help and indication is welcomed,
>>>>>>>>>>> Paul
>>>>>>>>>>>
>>>>>>>>>>> -- 
>>>>>>>>>>> You received this message because you are subscribed to the 
>>>>>>>>>>> Google Groups "Neo4j" group.
>>>>>>>>>>> To unsubscribe from this group and stop receiving emails from 
>>>>>>>>>>> it, send an email to [email protected].
>>>>>>>>>>>
>>>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>> -- 
>>>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>>>> Groups "Neo4j" group.
>>>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>>>> send an email to [email protected].
>>>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> -- 
>>>>>>> You received this message because you are subscribed to the Google 
>>>>>>> Groups "Neo4j" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>>> send an email to [email protected].
>>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>> -- 
>>>>>> You received this message because you are subscribed to the Google 
>>>>>> Groups "Neo4j" group.
>>>>>> To unsubscribe from this group and stop receiving emails from it, 
>>>>>> send an email to [email protected].
>>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>>
>>>>>>
>>>>>>
>>>>> -- 
>>>>> You received this message because you are subscribed to the Google 
>>>>> Groups "Neo4j" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>>> an email to [email protected].
>>>>> For more options, visit https://groups.google.com/d/optout.
>>>>>
>>>>>
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "Neo4j" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to [email protected].
>>>> For more options, visit https://groups.google.com/d/optout.
>>>>
>>>>
>>>>
>>> -- 
>>> You received this message because you are subscribed to the Google 
>>> Groups "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send 
>>> an email to [email protected].
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Neo4j" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected].
>> For more options, visit https://groups.google.com/d/optout.
>>
>>
>>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] LOAD CSV creates nodes but does not set properties

Reply via email to