sanastas commented on issue #5698: Oak: New Concurrent Key-Value Map 
URL: 
https://github.com/apache/incubator-druid/issues/5698#issuecomment-508370588
 
 
   @jon-wei , thank you, we will try to find something about the TPCH data.
   
   @gianm , Great to hear from you Gian and thank you for your great input!
   
   No doubts the query performance is important, we are working on it just now, 
to present the results soon. Also no doubts, the system level test/performance 
benchmark is also important. We try to collect the information about how it 
should be run to be convincing for the community.
   
   > It looks like you've been doing a lot of testing with really big 
incremental indexes, but it's more normal in Druid land to have smaller ones.
   
   There is no intention to force Druid to work with big incremental indexes, 
just wanted to show some cases where Oak advantage is clear. Ingestion (with 
Oak) on smaller indexes has the same latency/throughput (as with current 
IncrementalIndex), but takes less memory and gives a potential for a better 
concurrency if multi-threaded ingestion will be used one day.
   
   > One is that bigger indexes take up more memory, and large amounts of 
memory aren't always available.
   
   OakIncrementalIndex can let you handle more rows with less RAM.
   
   > Another is that querying bigger indexes takes more time
   
   We are currently working on queries speed. Hope to update you soon. What are 
the expected query times?
   
   > how long it takes to persist the incremental index to disk, and how long 
it takes to merge persisted indexes into a final segment at the end of the 
ingestion cycle.
   
   There can be a trade of, assuming all in all you process X bytes of data. It 
can be persisted in chunks of X/10 and then merged 10 times, or alternatively 
it can be persisted in big chunks like X/2 and may be merged only twice. I am 
just exaggerating the numbers, and I am not sure this theory can show better 
performance. Just something to be checked.
   
   Thank you for pointing on the publicly available dataset. We will 
investigate what we can do.
   
   -------------------------------------------
   
   Some additional question: how big the read-only segments are?
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to