gianm opened a new pull request #11950: URL: https://github.com/apache/druid/pull/11950
Add a "guessAggregatorHeapFootprint" method to AggregatorFactory that mitigates #6743 by enabling heap footprint estimates based on a specific number of rows. The idea is that at ingestion time, the number of rows that go into an aggregator will be 1 (if rollup is off) or will likely be a small number (if rollup is on). It's a heuristic, because of course nothing guarantees that the rollup ratio is a small number. But it's a common case, and I expect this logic to go wrong much less often than the current logic. Also, when it does go wrong, users can fix it by lowering maxRowsInMemory or maxBytesInMemory. The current situation is unintuitive: when the estimation goes wrong, users get an OOME, but actually they need to *raise* these limits to fix it. I don't think this is an ideal solution. In the future, I'd like to see something that involves the IncrementalIndex having much more direct control over how much memory it uses. Probably in concert with implementing off-heap resizeable aggregators, which is a feature that would be very nice to have for other reasons too. But in the meantime, the approach in this patch is simple, non-invasive, and I think it will help people. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
