Re: [Neo4j] Re: large cypher statements

Andrii Stesin Mon, 08 Dec 2014 09:46:22 -0800

And here the problem arrives... The same input .csv file described above, 
nodes imported, *now establishing relationships*. Second pass, the same 
file. The query is pretty simple:


//
// I *do* know that Neo4j is very sensitive with batch import of 
relationships, so 50-100 is more than enough
//
USING PERIODIC COMMIT 50
LOAD CSV FROM "file:///home/stesin/Documents/step1/3_generic/50_main.csv" 
AS row
MATCH
(src:LeftLabel  { p1: row[0] }),
(dst:RightLabel { p2: row[2], coll1: SPLIT(row[3], ':'), future_labels_coll: 
SPLIT(row[4], ':') })
USING INDEX src:LeftLabel(p1)
USING INDEX dst:RightLabel(p2)
MERGE
(src)-[r:IS_RELATED_TO { coll3: SPLIT(row[1], ':') }]->(dst)
ON CREATE SET
...just SETting a bunch of properties r.some_payload here...
;

Profiler ensured me, that both *src* and *dst* nodes will be accessed via 
fast looking up in corresponding indices. Ok so far.

I freed all my 16Gb RAM almost completely, configured 4Gb for neo4j JVM 
heap, increased MMIO buffers x2,5, made sure that there isn't any 
auto-indexing.
Than I opened http://localhost:7474/webadmin/# and fed the script to 
neo4j-shell

First time the import attempt stuck when webadmin reported some 380000+ 
relationships in the db. No diagnostics, complete silence in logs, but 
neo4j server just stopped responding both to webadmin and to new shell 
connections. After I restarted the server (it was shut down brutally after 
being unable to close gracefully) webadmin told me that there are about 
16k+ relationships in the db.

Tried once more with the same result, it just went south much earlier, at 
about 190k+ relationships.

So, this is the very same behavior we observed with early 2.1 releases! So 
what I did - just an old proven trick:

bash$ split -a 3 -l 10000 50_main.csv 50_main.

Got a bunch of 10000-liner files named from 50_main.aaa to 50_main.ajo - a 
total of 249 files,

than wrote a bash script:

#!/bin/bash

for file_name in `echo 50_main.a*`
do

echo -n "${file_name} ... "

qry="USING PERIODIC COMMIT 50"
qry="$qry LOAD CSV FROM 
\"file:///home/stesin/Documents/step1/3_generic/${file_name}\" AS row"
qry="$qry MATCH"
...
# ... line by line I built the whole query...

neo4j-shell -c "$qry" || exit 1

echo -n "sleeping... "

sleep 5

echo "done."

done

This worked: each 10000-liner file took between 1600 and 2600 ms (say ~8 
complete seconds per file including sleep 5) and the whole import completed 
Ok in some half an hour.

I think that this may be some kind of TCP/IP (HTTP) session timeout issue 
-- when some operation (transaction) takes too long, one side closes his 
end of the session while the other side (server?) continues to think that 
the session is still up.

Otherwise why the very same data, cut into 10000-line pieces, imports Ok - 
but the complete file does not?

WBR,
Andrii

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Re: [Neo4j] Re: large cypher statements

Reply via email to