ariel-miculas commented on code in PR #15591:
URL: https://github.com/apache/datafusion/pull/15591#discussion_r3265468487


##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########


Review Comment:
   The problem isn't that we're not transitioning to ProducingBlocks early 
(this can be done via the `skip partial agg feature`), the problem is that:
   * emitting a large RecordBatch releases the memory reservation occupied by 
this batch, but the memory is not freed, it is used in the ProducingOutput 
state while producing small output batches
   * ProducingBlocks slices the large RecordBatch into smaller ones, but 
calling `get_array_memory_size` on the small record batch returns the memory of 
the entire original large RecordBatch, so spilling is broken for the parent 
operator (e.g. repartitioning)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to