milenkovicm opened a new issue, #3460: URL: https://github.com/apache/arrow-datafusion/issues/3460
**Describe the bug** Row Hash aggregation, loads whole aggregation state to memory before sending a single batch downstream. The resulting record batch will have more rows than predefined batch size problematic part of code https://github.com/milenkovicm/arrow-datafusion/blob/17f069df4227b837cf2741a545c39a8b68d5fd76/datafusion/core/src/physical_plan/aggregates/row_hash.rs#L438 where iterator without limits is crated, and whole state is cloned, which doubles memory needed for the aggregation state. function `poll_next` creates single batch https://github.com/milenkovicm/arrow-datafusion/blob/17f069df4227b837cf2741a545c39a8b68d5fd76/datafusion/core/src/physical_plan/aggregates/row_hash.rs#L146 **To Reproduce** Run an aggregation **Expected behavior** Resulting aggregation should be chunked according to the predefined batch size **Additional context** -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
