ariel-miculas commented on code in PR #15591: URL: https://github.com/apache/datafusion/pull/15591#discussion_r3265468487
########## datafusion/physical-plan/src/aggregates/row_hash.rs: ########## Review Comment: The problem isn't that we're not transitioning to ProducingBlocks early (this can be done via the `skip partial agg feature`), the problem is that: * emitting a large RecordBatch releases the memory reservation occupied by this batch, but the memory is not freed, it is used in the ProducingOutput state while producing small output batches * ProducingBlocks slices the large RecordBatch into smaller ones, but calling `get_array_memory_size` on the small record batch returns the memory of the entire original large RecordBatch, so spilling is broken for the parent operator (e.g. repartitioning) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
