Yep,

it would be also interesting how you ran this? With neo4j-shell? Against a 
running server?
Did you configure any RAM or memory mapping setting in neo4j.properties?

Check out this blog post for some hints on memory config: 
http://blog.bruggen.com/2014/02/some-neo4j-import-tweaks-what-and-where.html?view=sidebar
Note that on windows the heap settings include the mmio settings unlike other 
OS'es.

Michael

Am 04.03.2014 um 17:22 schrieb Mark Needham <[email protected]>:

> Hi Aram,
> 
> * Do you have any other information of the spec of the machine you're running 
> this on? e.g. how much RAM etc
> * Have you tried upping the value to PERIODIC COMMIT? Perhaps try it out with 
> a smaller subset of the data to measure the impact - try it with values of 
> 1,000 / 10,000 perhaps. 
> * I think it would be interesting to pull out some other things as nodes as 
> well - might lead to more interesting queries e.g. CEO, Location, Registered 
> Agent, DOS Process, Jurisdiction could all be nodes that link back to a DOS. 
> 
> Let me know if any of that doesn't make sense.
> Mark
> 
> 
> On 4 March 2014 15:54, Aram Chung <[email protected]> wrote:
> Hi,
> 
> I was asked to post this here by Mark Needham (@markhneedham) who thought my 
> query took longer than it should.
> 
> I'm trying to see how graph databases could be used in investigative 
> journalism: I was loading in New York State's Active Corporations: Beginning 
> 1800 data from 
> https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6
>  as a 1964486-row csv (and deleted all U+F8FF characters, because I was 
> getting "[null] is not a supported property value"). The Cypher query I used 
> was 
> 
> USING PERIODIC COMMIT 500
> LOAD CSV
>   FROM 
> "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv"
>   AS company
> CREATE (:DataActiveCorporations
>       {
>               DOS_ID:company[0],
>               Current_Entity_Name:company[1],
>               Initial_DOS_Filing_Date:company[2],
>               County:company[3],
>               Jurisdiction:company[4],
>               Entity_Type:company[5],
> 
>               DOS_Process_Name:company[6],
>               DOS_Process_Address_1:company[7],
>               DOS_Process_Address_2:company[8],
>               DOS_Process_City:company[9],
>               DOS_Process_State:company[10],
>               DOS_Process_Zip:company[11],
> 
>               CEO_Name:company[12],
>               CEO_Address_1:company[13],
>               CEO_Address_2:company[14],
>               CEO_City:company[15],
>               CEO_State:company[16],
>               CEO_Zip:company[17],
> 
>               Registered_Agent_Name:company[18],
>               Registered_Agent_Address_1:company[19],
>               Registered_Agent_Address_2:company[20],
>               Registered_Agent_City:company[21],
>               Registered_Agent_State:company[22],
>               Registered_Agent_Zip:company[23],
> 
>               Location_Name:company[24],
>               Location_Address_1:company[25],
>               Location_Address_2:company[26],
>               Location_City:company[27],
>               Location_State:company[28],
>               Location_Zip:company[29]
>       }
> );
> 
> Each row is one node so it's as close to the raw data as possible. The idea 
> is loosely that these nodes will be linked with new nodes representing people 
> and addresses verified by reporters.
> 
> This is what I got:
> 
> +-------------------+
> | No data returned. |
> +-------------------+
> Nodes created: 1964486
> Properties set: 58934580
> Labels added: 1964486
> 4550855 ms
> 
> Some context information: 
> Neo4j Milestone Release 2.1.0-M01
> Windows 7
> java version "1.7.0_03"
> 
> Best,
> Aram
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
> 
> 
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to