sanastas commented on issue #5698: Oak: New Concurrent Key-Value Map 
URL: 
https://github.com/apache/incubator-druid/issues/5698#issuecomment-504917683
 
 
   @jihoonson thanks for taking a look!
   
   First, indeed this proposal is about supporting OakIncrementalIndex. This is 
the idea we pursue for a while already. In general it is about building a 
bigger off-heap IncrementalIndexes and enjoying a good performance :) Would you 
like to work with us on promoting this possibility? Just as consultant, your 
insights are very valuable....
   
   Second, Oak have couple of advantages when working with big data. As 
@ebortnik has mentioned, working with off-heap serialized data make it less 
affected to the JVM GC. In addition, Oak utilizes cache locality for searches. 
Lastly, Oak works good under multi-threading contention and scales well with 
multiple threads. However, in this specific experiment (single thread) the main 
problem should be caused by GC.
   
   Original, Druid's IncrementalIndex allocates the (to-be-added) rows on-heap 
prior to the benchmarks (taking 4GB out of given 12GB). Then StringIndexer 
takes more memory to save the String<->Integer translation, let's exaggerate 
and give it another 4GB. From here, for all other on-heap objects we remain 
with 4GB, which puts a lot of stress on GC. The ConcurrentSkipListMap used in 
Druid's IncrementalIndex is know to be less GC-friendly due to many small 
objects it allocates. I believe this is the reason for huge performance 
degradation we see.
   
   For example, here is a 
[reference](https://docs.oracle.com/cd/E19159-01/819-3681/6n5srlhqf/index.html) 
to Oracle themselves mentioning about Java Garbage Collector: 
   > Garbage collection (GC) reclaims the heap space previously allocated to 
objects no longer needed. The process of locating and removing the dead objects 
can stall any application and consume as much as 25 percent throughput.
   
   And 
[here](http://gridgain.blogspot.com/2014/06/jdk-g1-garbage-collector-pauses-for.html)
 one can take a look on experimenting with GC and big heap sizes. The 
conclusion is:
   > From conducting numerous tests, we have concluded that unless you are 
utilizing some off-heap technology, no Garbage Collector provided with JDK will 
render any kind of stable GC performance with heap sizes larger that 16GB. For 
example, on 50GB heaps we can often encounter up to 5 minute GC pauses, with 
average pauses of 2 to 4 seconds.
   
   Would be really glad to hear your thoughts!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to