ariel-miculas commented on PR #22729: URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4611260579
I'm curious about the high-level vision: is the plan to close https://github.com/apache/datafusion/pull/15591 in favor of this new approach? I would like the redesign of hash aggregation to take into account the memory constraints imposed by the finite memory pool, i.e. how does the implementation perform under OOM conditions. * how do we improve memory accounting (see https://github.com/apache/datafusion/issues/22526). * how do we avoid excessive memory allocations during OOM condition (see https://github.com/apache/datafusion/pull/22165) * other issues such as https://github.com/apache/datafusion/issues/19906 Otherwise we'll end up with the same issues that exist now. E.g. EmitTo::First(n) wasn't designed for emitting a large portion of the existing groups, so it over-allocated when used for emitting early in partial aggregation OOM case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
