2010YOUY01 commented on PR #22729: URL: https://github.com/apache/datafusion/pull/22729#issuecomment-4611653471
> I'm curious about the high-level vision: is the plan to close #15591 in favor of this new approach? Yes, the goal is to support blocked state management. The existing challenge is that the current implementation is hard to extend and review. I want to clean things up through this refactor first, and then apply the actual change. > I would like the redesign of hash aggregation to take into account the memory constraints imposed by the finite memory pool, i.e. how does the implementation perform under OOM conditions. > > * how do we improve memory accounting (see [Hash aggregation produces batches reporting huge memory size #22526](https://github.com/apache/datafusion/issues/22526)). > * how do we avoid excessive memory allocations during OOM condition (see [fix: reduce memory allocation overhead during partial aggregation ear… #22165](https://github.com/apache/datafusion/pull/22165)) > * other issues such as [[EPIC] Eliminate Long Polls in HashAggregate via Chunked Storage and Incremental Emission #19906](https://github.com/apache/datafusion/issues/19906) > > Otherwise we'll end up with the same issues that exist now. E.g. EmitTo::First(n) wasn't designed for emitting a large portion of the existing groups, so it over-allocated when used for emitting early in partial aggregation OOM case. All of these issues are symptoms of managing state in a large contiguous `Vec`. Blocked memory allocation should address them naturally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
