Wow this is great! I'll definitely try what you did. Please expect questions along the way.
And a write-up is coming; I was thinking I'd do that as soon as I get some relationships in, but now I should probably make a post about LOAD CSV. I'll post a link when I do. Thanks! Aram On Wednesday, March 5, 2014 6:48:34 AM UTC-5, Michael Hunger wrote: > > Oh and btw. I would LOVE to see a blog post from you about what you're > working on! > > Thanks so much > > Michael > > Am 05.03.2014 um 12:00 schrieb Michael Hunger < > [email protected] <javascript:>>: > > > I just tested your file on MacOS with these settings > > and got 6:30 for the 2m rows > > > > EXTRA_JVM_ARGUMENTS="-Xmx6G -Xms6G -Xmn1G" > > > > on windows you have to add the memory from the mmio settings in > neo4j.properties to the heap > > > > cat conf/neo4j.properties > > # Default values for the low-level graph engine > > neostore.nodestore.db.mapped_memory=200M > > neostore.relationshipstore.db.mapped_memory=1G > > neostore.propertystore.db.mapped_memory=500M > > neostore.propertystore.db.strings.mapped_memory=250M > > neostore.propertystore.db.arrays.mapped_memory=0M > > > > USING PERIODIC COMMIT 10000 > >> LOAD CSV > >> FROM > "file:///Users/mh/Downloads/Active_Corporations___Beginning_1800_no_head.csv" > > >> AS company > >> CREATE (:DataActiveCorporations > >> { > >> DOS_ID:company[0], > >> Current_Entity_Name:company[1], > >> Initial_DOS_Filing_Date:company[2], > > ...... > >> Registered_Agent_Zip:company[23], > >> > >> Location_Name:company[24], > >> Location_Address_1:company[25], > >> Location_Address_2:company[26], > >> Location_City:company[27], > >> Location_State:company[28], > >> Location_Zip:company[29] > >> } > >> ); > > > > +-------------------+ > > | No data returned. | > > +-------------------+ > > Nodes created: 1964486 > > Properties set: 58934580 > > Labels added: 1964486 > > 391059 ms > > > > > > > > Am 05.03.2014 um 08:34 schrieb Michael Hunger < > [email protected] <javascript:>>: > > > >> Oh and if you use neo4j-shell without server you have to set the heap > in bin\Neo4jShell.bat in EXTRA_JVM_ARGUMENTS="-Xmx4G -Xms4G -Xmn1G" > >> > >> and call > >> > >> bin\Neo4jShell -conf conf\neo4j.properties -path data\graph.db > >> > >> Am 05.03.2014 um 08:29 schrieb Michael Hunger < > [email protected] <javascript:>>: > >> > >>> Yep, > >>> > >>> it would be also interesting how you ran this? With neo4j-shell? > Against a running server? > >>> Did you configure any RAM or memory mapping setting in > neo4j.properties? > >>> > >>> Check out this blog post for some hints on memory config: > http://blog.bruggen.com/2014/02/some-neo4j-import-tweaks-what-and-where.html?view=sidebar > > >>> Note that on windows the heap settings include the mmio settings > unlike other OS'es. > >>> > >>> Michael > >>> > >>> Am 04.03.2014 um 17:22 schrieb Mark Needham > >>> <[email protected]<javascript:>>: > > >>> > >>>> Hi Aram, > >>>> > >>>> * Do you have any other information of the spec of the machine you're > running this on? e.g. how much RAM etc > >>>> * Have you tried upping the value to PERIODIC COMMIT? Perhaps try it > out with a smaller subset of the data to measure the impact - try it with > values of 1,000 / 10,000 perhaps. > >>>> * I think it would be interesting to pull out some other things as > nodes as well - might lead to more interesting queries e.g. CEO, Location, > Registered Agent, DOS Process, Jurisdiction could all be nodes that link > back to a DOS. > >>>> > >>>> Let me know if any of that doesn't make sense. > >>>> Mark > >>>> > >>>> > >>>> On 4 March 2014 15:54, Aram Chung <[email protected] <javascript:>> > wrote: > >>>> Hi, > >>>> > >>>> I was asked to post this here by Mark Needham (@markhneedham) who > thought my query took longer than it should. > >>>> > >>>> I'm trying to see how graph databases could be used in investigative > journalism: I was loading in New York State's Active Corporations: > Beginning 1800 data from > https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6as > a 1964486-row csv (and deleted all U+F8FF characters, because I was > getting "[null] is not a supported property value"). The Cypher query I > used was > >>>> > >>>> USING PERIODIC COMMIT 500 > >>>> LOAD CSV > >>>> FROM > "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv" > > > >>>> AS company > >>>> CREATE (:DataActiveCorporations > >>>> { > >>>> DOS_ID:company[0], > >>>> Current_Entity_Name:company[1], > >>>> Initial_DOS_Filing_Date:company[2], > >>>> County:company[3], > >>>> Jurisdiction:company[4], > >>>> Entity_Type:company[5], > >>>> > >>>> DOS_Process_Name:company[6], > >>>> DOS_Process_Address_1:company[7], > >>>> DOS_Process_Address_2:company[8], > >>>> DOS_Process_City:company[9], > >>>> DOS_Process_State:company[10], > >>>> DOS_Process_Zip:company[11], > >>>> > >>>> CEO_Name:company[12], > >>>> CEO_Address_1:company[13], > >>>> CEO_Address_2:company[14], > >>>> CEO_City:company[15], > >>>> CEO_State:company[16], > >>>> CEO_Zip:company[17], > >>>> > >>>> Registered_Agent_Name:company[18], > >>>> Registered_Agent_Address_1:company[19], > >>>> Registered_Agent_Address_2:company[20], > >>>> Registered_Agent_City:company[21], > >>>> Registered_Agent_State:company[22], > >>>> Registered_Agent_Zip:company[23], > >>>> > >>>> Location_Name:company[24], > >>>> Location_Address_1:company[25], > >>>> Location_Address_2:company[26], > >>>> Location_City:company[27], > >>>> Location_State:company[28], > >>>> Location_Zip:company[29] > >>>> } > >>>> ); > >>>> > >>>> Each row is one node so it's as close to the raw data as possible. > The idea is loosely that these nodes will be linked with new nodes > representing people and addresses verified by reporters. > >>>> > >>>> This is what I got: > >>>> > >>>> +-------------------+ > >>>> | No data returned. | > >>>> +-------------------+ > >>>> Nodes created: 1964486 > >>>> Properties set: 58934580 > >>>> Labels added: 1964486 > >>>> 4550855 ms > >>>> > >>>> Some context information: > >>>> Neo4j Milestone Release 2.1.0-M01 > >>>> Windows 7 > >>>> java version "1.7.0_03" > >>>> > >>>> Best, > >>>> Aram > >>>> > >>>> -- > >>>> You received this message because you are subscribed to the Google > Groups "Neo4j" group. > >>>> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] <javascript:>. > >>>> For more options, visit https://groups.google.com/groups/opt_out. > >>>> > >>>> > >>>> -- > >>>> You received this message because you are subscribed to the Google > Groups "Neo4j" group. > >>>> To unsubscribe from this group and stop receiving emails from it, > send an email to [email protected] <javascript:>. > >>>> For more options, visit https://groups.google.com/groups/opt_out. > >>> > >> > > > > -- You received this message because you are subscribed to the Google Groups "Neo4j" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.
