sanastas commented on issue #5698: Oak: New Concurrent Key-Value Map URL: https://github.com/apache/incubator-druid/issues/5698#issuecomment-508370588 @jon-wei , thank you, we will try to find something about the TPCH data. @gianm , Great to hear from you Gian and thank you for your great input! No doubts the query performance is important, we are working on it just now, to present the results soon. Also no doubts, the system level test/performance benchmark is also important. We try to collect the information about how it should be run to be convincing for the community. > It looks like you've been doing a lot of testing with really big incremental indexes, but it's more normal in Druid land to have smaller ones. There is no intention to force Druid to work with big incremental indexes, just wanted to show some cases where Oak advantage is clear. Ingestion (with Oak) on smaller indexes has the same latency/throughput (as with current IncrementalIndex), but takes less memory and gives a potential for a better concurrency if multi-threaded ingestion will be used one day. > One is that bigger indexes take up more memory, and large amounts of memory aren't always available. OakIncrementalIndex can let you handle more rows with less RAM. > Another is that querying bigger indexes takes more time We are currently working on queries speed. Hope to update you soon. What are the expected query times? > how long it takes to persist the incremental index to disk, and how long it takes to merge persisted indexes into a final segment at the end of the ingestion cycle. There can be a trade of, assuming all in all you process X bytes of data. It can be persisted in chunks of X/10 and then merged 10 times, or alternatively it can be persisted in big chunks like X/2 and may be merged only twice. I am just exaggerating the numbers, and I am not sure this theory can show better performance. Just something to be checked. Thank you for pointing on the publicly available dataset. We will investigate what we can do. ------------------------------------------- Some additional question: how big the read-only segments are?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
