Re: [Neo4j] embedded Neo4J client configuration tuning for large data import

Eugene pr3d4t0r Ciurana Tue, 17 Dec 2013 16:27:18 -0800

Hi Michael - I'm in the same project as Guan.

Use case:  continuous feed of nodes and relationships, millions per day, 
7x24.  Nodes are in the 150 to 10,000 bytes.  Sources and relationships are 
coming from multiple data sources.  Relationships can be between any two 
nodes, regardless of source.

OS:  our integration, testing, and production environments are Linux RHEL, 
44 GB RAM dedicated to the embedded Neo4J app server.  Our dev workstations 
are OS X, 16 GB RAM + miscellaneous IDEs and we can push relatively high 
volume on those.

Oracle latest JRE on all of the above, tuned for high memory usage, 
concurrent GC.

App container:  Mule 3.4.0 with two dedicated flows:  one for node commits, 
one for relationship commits.

Nodes and relationships are fed from the high volume cluster via 
JMS/ActiveMQ.  The Neo4J app server only does two things:  subscribes to 
ActiveMQ and commits data to the DB.  No other activity or application run 
there.  There are separate queues for nodes and for relationships, one 
each, no conversations.

Topology:  http://eugeneciurana.com/personal/images/Neo4J-topology.png

Neo4J version:  1.9.4

Now -- the main issue we experience is because none of us has a full idea 
of how to tune Neo4J.  Guan has been doing a great job at unearthing the 
details, but we seem to have hit a wall.  We're able to push 1-10 million 
nodes or relationships, then we see memory exceptions (Guan can explain 
more - hopefully he'll see this later).  After checking the servers via JMX 
and other instrumentation, we see that processors and memory are super 
lean, and that the machine is mostly idling.  Having tuned (poorly, I admit 
it) Neo4J stand-alone + REST a few weeks ago, I figured that we somehow 
need to tell the embedded Neo4J how to map memory -- and to solve some of 
the other issues that Guan mentioned in his original post.

So -- the app server has all the resources it needs.  The Java container 
has more than enough memory.  Everything (JVM, OS, supporting libraries, 
etc.) is up-to-date.

Thanks in advance for your help -- we look forward to hearing from you.

(We at some point in the future intend to cluster...  not yet, though.  We 
can't even get this to work well with a single instance yet.  You may 
ignore those "read only" instances in the diagram.)

Cheers!

pr3d4t0r
----
On Tuesday, December 17, 2013 3:11:28 PM UTC-6, Michael Hunger wrote:
>
> What is your use-case? What is the large amount of data you're writing to 
> the graph?
> What OS are you working on?
> And what Neo4j version?
>
> Increasing the memory mapping settings also helps with writes, esp. the 
> settings for the nodestore and relationship-store, the more of that can be 
> memory mapped the more can be written to in parallel.
>
> Neo4j supports concurrent writes, but it 
>
> #1 serializes commits on writing to the transaction log
> #2 locks nodes and relationships if you change properties 
> #3 locks both nodes if you add a relationship
>
> Common practice is to have a large enough tx size (e.g. 30-50k elements) 
> per commit and also aggregate updates that way that they write to 
> different, disjunct subgraphs of the data.
>
> HTH,
>
>
> Michael
>
>
> otherwise see the blog posts refered to from: 
> http://neo4j.org/develop/import
>
>
> Am 17.12.2013 um 19:49 schrieb Guan Guan <[email protected]<javascript:>
> >:
>
> Hi,
>
>
> In our use case, we need to do a lot of data importing/updating everyday ( 
> billions of nodes/relationships ). 
>
>
> What's the way to tuning the configuration to boost performance?
>
>
> Does the kernel config help data ingestion? Settings like '
> *neostore.propertystore.db.strings.mapped_memory*' is for query cache 
> only, am I correct? Does these parameter help with data import performance? 
>   
>
>
>
> One more question, how does neo4J embedded do with concurrent data 
> importing? I have multiple threads writing to embedded database at the same 
> time, I always get locking exception that multiple threads try to lock the 
> same relationship. What's the recommend way for multiple threads data 
> ingestion?
>
>
>
> Thanks,
>
>
> Guan
>
> -- 
> You received this message because you are subscribed to the Google Groups 
> "Neo4j" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to [email protected] <javascript:>.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Neo4j" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: [Neo4j] embedded Neo4J client configuration tuning for large data import

Reply via email to