We're loading 20 million abstracts into Neo4J.  The rate has been about 2.5 
million abstracts per week.  For comparison, we can load all 20 million 
abstracts into Solr Cloud in less than 24 hours.  The abstracts average 
about 400-500 words each.  For each abstract, we have 5 additional entity 
nodes with a relationship between the abstract and these entities.   We're 
looking for any advice on speeding up the load times for Neo4J.

In our attempts to get better performance when ingesting the abstracts, we 
have tried combinations of py2neo version 2 and 3, and Neo4J version 2 and 
3 Enterprise.  Our platform is a 2 processor 12 core Linux server with 32GB 
of memory.  We use the default Neo4J configuration.  We prefer merge() to 
ensure only one node per unique article ID but have tried create().  We 
utilize batches of 1000 articles.  We minimize round trips to the server 
with transactions, first the entities and then the relationships.  No 
find() or find_one() calls are necessary.  The script itself runs quickly 
and then lingers during the commit suggesting the slowdown is coming from 
the Neo4J server.  During our trials, we discovered and reported that 
py2neo 3 hangs indefinitely 99% of the time for merge() with the Bolt 
transaction.  It also hangs for the HTTP transactions, but its rare.

Once we get a reasonable single-threaded ingestion rate, we can consider 
running the load in parallel but since Neo4J is single threaded when 
updating (correct?), we're not sure that will help much.

Eventually we will be loading from a variety of sources in parallel so we 
must avoid solutions that wipe the Neo4J database first.

Has anyone else experience such slow load times?  Is there some best 
practices we've overloaded (other than writing directly in Java) that might 
help increase load performance?

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/d/optout.

Reply via email to