jihoonson commented on issue #11544: URL: https://github.com/apache/druid/issues/11544#issuecomment-894013851
Yes, I think the problem is too many items per country. Druid uses a fixed-size buffer per row to keep the sketch (`DoublesSketch`). Since the buffer size is fixed but Druid doesn't know the number of items in advance, it estimates the buffer size to be large enough to hold one billion items in the sketch. So, when you have less items than one billion, the sketch can fit in the buffer and everything works well. The interesting part is when you have more items than one billion. In that case, Druid lets the sketch allocate extra heap memory to hold those items that don't fit in the buffer. However, `DoublesSketch` is not working as we expected and throws NPE when it tries to allocate more memory. This issue is filed in https://github.com/apache/datasketches-java/issues/358. As a workaround, you could use other functions to compute approximate quantiles, such as `DS_QUANTILES_SKETCH` or `APPROX_QUANTILE`. Note that `APPROX_QUANTILE` uses the deprecated approximate histogram aggregator and its accuracy might be not great. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
