RyanJamesStewart opened a new pull request, #22416: URL: https://github.com/apache/datafusion/pull/22416
## Which issue does this PR close? Related to #22164 and #22165. This extends the same fix to the min/max accumulators. ## Rationale for this change #22165 fixed an allocation problem in `EmitTo::First(n)`: `drain(..n).collect()` always allocates `n` elements and leaves the retained buffer at its pre-emit capacity, so an OOM-triggered emit (where `n` is close to the buffer length) ends up copying the largest allocation. The same idiom is still present in two accumulators that #22165 did not touch: - `MinMaxStructAccumulator::emit_to` in `datafusion/functions-aggregate/src/min_max/min_max_struct.rs` - `MinMaxBytesAccumulator::emit_to` in `datafusion/functions-aggregate/src/min_max/min_max_bytes.rs` Both emit a prefix of their `min_max` group buffer with `self.min_max.drain(..n).collect()`. ## What changes are included in this PR? Adds a `split_vec_min_alloc` helper in `min_max.rs` that allocates `min(n, len - n)` by choosing `drain+collect` or `split_off+replace` depending on which side is smaller, and routes both accumulators through it. The helper is duplicated from `datafusion-physical-plan`'s `split_vec_min_alloc` (added in #22165) because that one is `pub(super)`-scoped to a different crate. It is a generic `Vec<T>` utility with no aggregate-specific logic. If maintainers prefer, it could instead be hoisted into `datafusion-common` and shared by both crates; happy to do that in this PR or as a follow-up. I went with the local copy to keep the change minimal and avoid touching recently merged code. ## Are these changes tested? Yes. Unit tests cover both branches of the helper plus the `n == len` and `n == 0` boundaries. Behavior of `emit_to` is unchanged. `cargo test -p datafusion-functions-aggregate` passes (128 tests). ## Are there any user-facing changes? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
