I just tested your file on MacOS with these settings
and got 6:30 for the 2m rows
EXTRA_JVM_ARGUMENTS="-Xmx6G -Xms6G -Xmn1G"
on windows you have to add the memory from the mmio settings in
neo4j.properties to the heap
cat conf/neo4j.properties
# Default values for the low-level graph engine
neostore.nodestore.db.mapped_memory=200M
neostore.relationshipstore.db.mapped_memory=1G
neostore.propertystore.db.mapped_memory=500M
neostore.propertystore.db.strings.mapped_memory=250M
neostore.propertystore.db.arrays.mapped_memory=0M
USING PERIODIC COMMIT 10000
> LOAD CSV
> FROM
> "file:///Users/mh/Downloads/Active_Corporations___Beginning_1800_no_head.csv"
> AS company
> CREATE (:DataActiveCorporations
> {
> DOS_ID:company[0],
> Current_Entity_Name:company[1],
> Initial_DOS_Filing_Date:company[2],
......
> Registered_Agent_Zip:company[23],
>
> Location_Name:company[24],
> Location_Address_1:company[25],
> Location_Address_2:company[26],
> Location_City:company[27],
> Location_State:company[28],
> Location_Zip:company[29]
> }
> );
+-------------------+
| No data returned. |
+-------------------+
Nodes created: 1964486
Properties set: 58934580
Labels added: 1964486
391059 ms
Am 05.03.2014 um 08:34 schrieb Michael Hunger
<[email protected]>:
> Oh and if you use neo4j-shell without server you have to set the heap in
> bin\Neo4jShell.bat in EXTRA_JVM_ARGUMENTS="-Xmx4G -Xms4G -Xmn1G"
>
> and call
>
> bin\Neo4jShell -conf conf\neo4j.properties -path data\graph.db
>
> Am 05.03.2014 um 08:29 schrieb Michael Hunger
> <[email protected]>:
>
>> Yep,
>>
>> it would be also interesting how you ran this? With neo4j-shell? Against a
>> running server?
>> Did you configure any RAM or memory mapping setting in neo4j.properties?
>>
>> Check out this blog post for some hints on memory config:
>> http://blog.bruggen.com/2014/02/some-neo4j-import-tweaks-what-and-where.html?view=sidebar
>> Note that on windows the heap settings include the mmio settings unlike
>> other OS'es.
>>
>> Michael
>>
>> Am 04.03.2014 um 17:22 schrieb Mark Needham <[email protected]>:
>>
>>> Hi Aram,
>>>
>>> * Do you have any other information of the spec of the machine you're
>>> running this on? e.g. how much RAM etc
>>> * Have you tried upping the value to PERIODIC COMMIT? Perhaps try it out
>>> with a smaller subset of the data to measure the impact - try it with
>>> values of 1,000 / 10,000 perhaps.
>>> * I think it would be interesting to pull out some other things as nodes as
>>> well - might lead to more interesting queries e.g. CEO, Location,
>>> Registered Agent, DOS Process, Jurisdiction could all be nodes that link
>>> back to a DOS.
>>>
>>> Let me know if any of that doesn't make sense.
>>> Mark
>>>
>>>
>>> On 4 March 2014 15:54, Aram Chung <[email protected]> wrote:
>>> Hi,
>>>
>>> I was asked to post this here by Mark Needham (@markhneedham) who thought
>>> my query took longer than it should.
>>>
>>> I'm trying to see how graph databases could be used in investigative
>>> journalism: I was loading in New York State's Active Corporations:
>>> Beginning 1800 data from
>>> https://data.ny.gov/Economic-Development/Active-Corporations-Beginning-1800/n9v6-gdp6
>>> as a 1964486-row csv (and deleted all U+F8FF characters, because I was
>>> getting "[null] is not a supported property value"). The Cypher query I
>>> used was
>>>
>>> USING PERIODIC COMMIT 500
>>> LOAD CSV
>>> FROM
>>> "file://path/to/csv/Active_Corporations___Beginning_1800__without_header__wonky_characters_fixed.csv"
>>> AS company
>>> CREATE (:DataActiveCorporations
>>> {
>>> DOS_ID:company[0],
>>> Current_Entity_Name:company[1],
>>> Initial_DOS_Filing_Date:company[2],
>>> County:company[3],
>>> Jurisdiction:company[4],
>>> Entity_Type:company[5],
>>>
>>> DOS_Process_Name:company[6],
>>> DOS_Process_Address_1:company[7],
>>> DOS_Process_Address_2:company[8],
>>> DOS_Process_City:company[9],
>>> DOS_Process_State:company[10],
>>> DOS_Process_Zip:company[11],
>>>
>>> CEO_Name:company[12],
>>> CEO_Address_1:company[13],
>>> CEO_Address_2:company[14],
>>> CEO_City:company[15],
>>> CEO_State:company[16],
>>> CEO_Zip:company[17],
>>>
>>> Registered_Agent_Name:company[18],
>>> Registered_Agent_Address_1:company[19],
>>> Registered_Agent_Address_2:company[20],
>>> Registered_Agent_City:company[21],
>>> Registered_Agent_State:company[22],
>>> Registered_Agent_Zip:company[23],
>>>
>>> Location_Name:company[24],
>>> Location_Address_1:company[25],
>>> Location_Address_2:company[26],
>>> Location_City:company[27],
>>> Location_State:company[28],
>>> Location_Zip:company[29]
>>> }
>>> );
>>>
>>> Each row is one node so it's as close to the raw data as possible. The idea
>>> is loosely that these nodes will be linked with new nodes representing
>>> people and addresses verified by reporters.
>>>
>>> This is what I got:
>>>
>>> +-------------------+
>>> | No data returned. |
>>> +-------------------+
>>> Nodes created: 1964486
>>> Properties set: 58934580
>>> Labels added: 1964486
>>> 4550855 ms
>>>
>>> Some context information:
>>> Neo4j Milestone Release 2.1.0-M01
>>> Windows 7
>>> java version "1.7.0_03"
>>>
>>> Best,
>>> Aram
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google Groups
>>> "Neo4j" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an
>>> email to [email protected].
>>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
--
You received this message because you are subscribed to the Google Groups
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.