gianm opened a new issue #6743: IncrementalIndex generally overestimates theta 
sketch size
URL: https://github.com/apache/incubator-druid/issues/6743
 
 
   Theta sketches have a very large max size by default, relative to typical 
row sizes (about 250KB with "size" set to the default of 16384). The 
ingestion-time row size estimator (getMaxBytesPerRowForAggregators in 
OnheapIncrementalIndex) uses this figure to estimate row sizes when theta 
sketches are used at ingestion time, leading to way more spills than is 
reasonable. It would be better to use an estimate based more on actual current 
size. I'm not sure how to get this, though.
   
   @leerho - or anyone else - do you have any ideas or suggestions?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to