gianm opened a new pull request #11950:
URL: https://github.com/apache/druid/pull/11950


   Add a "guessAggregatorHeapFootprint" method to AggregatorFactory that 
mitigates #6743 by enabling heap footprint estimates based on a specific number 
of rows. The idea is that at ingestion time, the number of rows that go into an 
aggregator will be 1 (if rollup is off) or will likely be a small number (if 
rollup is on).
   
   It's a heuristic, because of course nothing guarantees that the rollup ratio 
is a small number. But it's a common case, and I expect this logic to go wrong 
much less often than the current logic. Also, when it does go wrong, users can 
fix it by lowering maxRowsInMemory or maxBytesInMemory. The current situation 
is unintuitive: when the estimation goes wrong, users get an OOME, but actually 
they need to *raise* these limits to fix it.
   
   I don't think this is an ideal solution. In the future, I'd like to see 
something that involves the IncrementalIndex having much more direct control 
over how much memory it uses. Probably in concert with implementing off-heap 
resizeable aggregators, which is a feature that would be very nice to have for 
other reasons too. But in the meantime, the approach in this patch is simple, 
non-invasive, and I think it will help people.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to