sanastas commented on issue #5698: Oak: New Concurrent Key-Value Map URL: https://github.com/apache/incubator-druid/issues/5698#issuecomment-507994908 In the meanwhile we would like to share some insights we had while playing with IncrementalIngestionBenchmark. We continued the experiments that had started and were presented above under “IncrementalIngestionBenchmark” title. We compare native Druid’s incremental index and newly suggested OakIncrementalIndex (data taken off-heap). The data distribution/generation is exactly as in IncrementalIngestionBenchmark, we just insert more rows. This time we inserted 6 Million rows (about 8GB of data) while giving 24GB RAM. Number 24GB appear as this is almost the lowest number allowing native Druid’s IncrementalIndex to run properly. Even taking into consideration that in IncrementalIngestionBenchmark, the rows come prepared before the benchmark measurement and actually already take big chunk of on-heap memory, x3 memory requirement sounds a lot… Druid’s incremental index gets 24GB on heap memory. OakIncrementalIndex always gets 16GB on-heap and 8GB off-heap (in total 24GB RAM). The results can be see in the file bellow. The graph shows throughput (number of operations in seconds) so the bigger the better. OakIncrementalIndex performs about twice faster. [Ingestion Throughput 6M rows (8GB data) ingested.pdf](https://github.com/apache/incubator-druid/files/3353954/Ingestion.Throughput.6M.rows.8GB.data.ingested.pdf) In order to stress the memory overhead requirement, we have run yet another experiment, this time inserting 7 Million rows, which is up to 9GB data. We gradualy increased the memory requirement and present the throughput as function of total RAM used. Results in the file below. OakIncrementalIndex off-heap memory requirement was always 9GB, as it is what is written there. We have started by giving 24GB of total RAM as this is where OakIncrementalIndex was able to operate, although its throughput was very low. Native Druid’s IncrementalIndex was unable to operate until 28GB of on-heap memory was allowed to be used. [Ingestion 9GB data into Druid.pdf](https://github.com/apache/incubator-druid/files/3353955/Ingestion.9GB.data.into.Druid.pdf) Does the question of metadata memory overhead bother you? Also would you be interested in working with bigger IncrementalIndexes, in order to later have less flushes to disk (persist), causing less merges, and thus higher performance?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
