gianm opened a new issue #6743: URL: https://github.com/apache/druid/issues/6743
Theta sketches have a very large max size by default, relative to typical row sizes (about 250KB with "size" set to the default of 16384). The ingestion-time row size estimator (getMaxBytesPerRowForAggregators in OnheapIncrementalIndex) uses this figure to estimate row sizes when theta sketches are used at ingestion time, leading to way more spills than is reasonable. It would be better to use an estimate based more on actual current size. I'm not sure how to get this, though. @leerho - or anyone else - do you have any ideas or suggestions? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
