RyanJamesStewart opened a new pull request, #22416:
URL: https://github.com/apache/datafusion/pull/22416

   ## Which issue does this PR close?
   
   Related to #22164 and #22165. This extends the same fix to the min/max 
accumulators.
   
   ## Rationale for this change
   
   #22165 fixed an allocation problem in `EmitTo::First(n)`: 
`drain(..n).collect()` always allocates `n` elements and leaves the retained 
buffer at its pre-emit capacity, so an OOM-triggered emit (where `n` is close 
to the buffer length) ends up copying the largest allocation.
   
   The same idiom is still present in two accumulators that #22165 did not 
touch:
   
   - `MinMaxStructAccumulator::emit_to` in 
`datafusion/functions-aggregate/src/min_max/min_max_struct.rs`
   - `MinMaxBytesAccumulator::emit_to` in 
`datafusion/functions-aggregate/src/min_max/min_max_bytes.rs`
   
   Both emit a prefix of their `min_max` group buffer with 
`self.min_max.drain(..n).collect()`.
   
   ## What changes are included in this PR?
   
   Adds a `split_vec_min_alloc` helper in `min_max.rs` that allocates `min(n, 
len - n)` by choosing `drain+collect` or `split_off+replace` depending on which 
side is smaller, and routes both accumulators through it.
   
   The helper is duplicated from `datafusion-physical-plan`'s 
`split_vec_min_alloc` (added in #22165) because that one is `pub(super)`-scoped 
to a different crate. It is a generic `Vec<T>` utility with no 
aggregate-specific logic. If maintainers prefer, it could instead be hoisted 
into `datafusion-common` and shared by both crates; happy to do that in this PR 
or as a follow-up. I went with the local copy to keep the change minimal and 
avoid touching recently merged code.
   
   ## Are these changes tested?
   
   Yes. Unit tests cover both branches of the helper plus the `n == len` and `n 
== 0` boundaries. Behavior of `emit_to` is unchanged. `cargo test -p 
datafusion-functions-aggregate` passes (128 tests).
   
   ## Are there any user-facing changes?
   
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to