hhhizzz opened a new pull request, #23182:
URL: https://github.com/apache/datafusion/pull/23182

   ## Which issue does this PR close?
   
   short-term solution for #23178.
   
   ## Rationale for this change
   
   PR #23055 changed final hash aggregate output to emit groups incrementally 
with
   `EmitTo::First(batch_size)`. For terminal final aggregate output, this can 
cause
   the group value state to be repeatedly compacted while output batches are 
being
   produced. On TPC-DS q23 this showed up as a significant regression.
   
   This PR implements the short-term approach discussed in #23178: materialize 
the
   final aggregate output once, then return slices of that materialized
   `RecordBatch` according to `batch_size`.
   
   This avoids changing the `GroupValues` API while preserving bounded 
downstream
   batch sizes.
   
   ## What changes are included in this PR?
   
   - Adds an `OutputtingMaterialized` hash aggregate state.
   - Adds `MaterializedOutput`, a small wrapper around a `RecordBatch` plus 
output
     offset.
   - Changes final hash aggregate output to:
     - emit all final groups once,
     - evaluate all final aggregate values once,
     - slice the materialized batch for subsequent output polling.
   - Leaves partial aggregate output behavior unchanged.
   - Adds focused tests for materialized output slicing and final hash aggregate
     output state transitions.
   
   ## Performance
   TPC-DS SF10 full 99 queries, 10 rounds:
   
   - Total runtime ratio: `0.857051`
   - Geomean ratio: `0.976652` (~2.4% faster)
   - q23 ratio: `0.313770` (~218.7% faster), faster in `10/10` rounds
   
   Regressions over 5% were observed in 10 queries. Most have small absolute
   deltas, but the largest slowdowns were:
   
   - q67: `1.055907`, +170.996 ms
   - q39: `1.060436`, +98.544 ms
   - q9: `1.050135`, +37.858 ms
   - q70: `1.061124`, +11.848 ms
   - q35: `1.052392`, +9.386 ms
   - q33: `1.063655`, +6.995 ms
   - q98: `1.071688`, +6.515 ms
   - q91: `1.109819`, +5.362 ms
   - q15: `1.058356`, +5.072 ms
   - q27: `1.057686`, +0.815 ms
   
   Overall, this recovers the q23 regression strongly and improves full-query
   geomean, but q39 and q67 are worth calling out as residual per-query 
slowdowns.
   
   ## Testing
   
   - `cargo fmt --all -- --check`
   - `cargo test -p datafusion-physical-plan materializ`
   - `cargo test -p datafusion-physical-plan aggregates::`
   - TPC-DS SF10 q23, 3 rounds
   - TPC-DS SF10 full 99 queries, 10 rounds


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to