Re: [Neo4j] Memory allocation when using the Neo4j Shell to import from CSV

Arielle Bonnici Thu, 28 Jan 2016 03:14:20 -0800

Hi Michael,

I tried with 2.3.2, started with a fresh db that had 10 nodes in it. I then 
ran the first command to import 5 million nodes from CSV. This took 12 
minutes and when it finished it was using 1.6GB memory. Size on disk was 
2.5GB.


I ran the second command and it created the 5 million edges in 8 minutes, 
after which it was using 1.8GB memory and size on disk was 3.32GB. A few 
minutes later memory usage went down to 1.3GB.

Next I ran the first command again on another CSV file which contained 5 
million events too. It took 15 minutes to create the nodes, was using 2.2GB 
memory and size on disk was 5.9GB.

When I ran the second command on this file it completed in 8 minutes and 
was still using 2.2GB memory. Size on disk was at 6.8GB.

After that I ran another command similar to the second one, which creates 
another edge for each node and it completed in 8 minutes and memory was at 
2.3GB.

So up to now it does seem to be a bit better in that it doesn't stall.

When I prefix the second command with EXPLAIN this is what I'm getting:

Compiler CYPHER 2.3

Planner RULE

Runtime INTERPRETED

+--------------+-----------------------+-------------------------------+
| Operator     | Identifiers           | Other                         |
+--------------+-----------------------+-------------------------------+
| +EmptyResult |                       |                               |
| |            +-----------------------+-------------------------------+
| +Merge(Into) | anon[167], e, f, line | (e)-[:FOR]->(f)               |
| |            +-----------------------+-------------------------------+
| +SchemaIndex | e, f, line  | line.eventID; :EVENT(eventID) |
| |            +-----------------------+-------------------------------+
| +SchemaIndex | f, line               | line.name; :Feature(name)     |
| |            +-----------------------+-------------------------------+
| +LoadCSV     | line                  |                               |
+--------------+-----------------------+-------------------------------+

Total database accesses: ?


Regards,

Arielle

On Wednesday, January 27, 2016 at 5:29:52 PM UTC+1, Michael Hunger wrote:
>
> Can you try it on 2.3.2  too?
> In general your code looks ok. Can you share your query plan?
> Prefix your query with EXPLAIN and remove the USING PERIODIC COMMIT to see 
> the plan.
>
> How big is your neo4j store on disk?
>
> Michael
>
>
> Am 27.01.2016 um 13:29 schrieb Arielle Bonnici <[email protected] 
> <javascript:>>:
>
> I'm currently running a test with Neo4j CE 2.3.1 on a Windows 7 machine 
> with 4GB memory and trying to understand how to manage memory allocation 
> when importing from CSV using the Neo4jShell.
>
> I am running these two commands, the first one to create the nodes and the 
> second one to create edges (one edge for each node).
>
> USING PERIODIC COMMIT 10000
> LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line
> CREATE (:EVENT { eventID: line.eventID, name: line.name, referrer: 
> line.referrer, sessionID: toInt(line.sessionID), timestamp: 
> toInt(line.timestamp), pID: toInt(line.pID)});
>
> USING PERIODIC COMMIT 10000
> LOAD CSV WITH HEADERS FROM 'file:///C:\\seq.csv' AS line 
> MATCH (f:Feature)
> WHERE f.name = line.name
> MATCH (e:EVENT) 
> WHERE e.eventID = line.eventID
> MERGE (e)-[:FOR]->(f);
>
> I have the following related indexes and constraints:
>
> Indexes                                                          
>   ON :EVENT(eventID) ONLINE (for uniqueness constraint) 
>   ON :Feature(name)  ONLINE (for uniqueness constraint) 
>
> Constraints
>   ON (feature:Feature) ASSERT feature.name IS UNIQUE
>   ON (event:EVENT) ASSERT event.eventID IS UNIQUE
>
> When I have 5 million nodes in the db and try to load a CSV that has 
> another 5 million nodes, it takes about 15 minutes to complete and gets to 
> ~1.5GB memory usage. If I immediately run the second command to create the 
> edges, the memory starts going up again and sometimes it will stall at some 
> point. In order to make sure the second command works I have to restart 
> Neo4j. 
>
> I'm trying to understand if I can improve this by optimizing the commands 
> somehow, or if specifying memory settings in the properties file might 
> help...in which case how best to go about that?
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/d/optout.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Memory allocation when using the Neo4j Shell to import from CSV

Reply via email to