alamb opened a new issue, #23081: URL: https://github.com/apache/datafusion/issues/23081
### Is your feature request related to a problem or challenge? `GroupsAccumulator` currently has two related APIs: - `convert_to_state(...)`, which converts raw input rows directly into intermediate aggregate state - `supports_convert_to_state()`, which lets hash aggregate decide whether the skip-partial-aggregation optimization is available This optional capability adds branching and complexity in both hash aggregate implementations: - `datafusion/physical-plan/src/aggregates/row_hash.rs` - `datafusion/physical-plan/src/aggregates/aggregate_hash_table/*` It also means some `GroupsAccumulator` implementations may silently miss the high-cardinality fast path. ### Describe the solution you'd like Make `convert_to_state` a required/high-performance part of the `GroupsAccumulator` contract, remove `supports_convert_to_state`, and simplify hash aggregate so skip-partial aggregation no longer needs to check per-accumulator support. Suggested scope: - Audit all `GroupsAccumulator` implementations and ensure each has a correct `convert_to_state` - Remove the default `not_impl` implementation of `convert_to_state` from `datafusion/expr-common/src/groups_accumulator.rs` - Remove `GroupsAccumulator::supports_convert_to_state` - Remove skip-aggregation support checks from hash aggregate code - Update FFI groups accumulator support, currently mirrored via `FFI_GroupsAccumulator::supports_convert_to_state` - Update comments/docs that describe `convert_to_state` as optional - Update tests such as `aggregate_skip_partial.slt` so skip-partial coverage expects all grouped accumulators to support the path Acceptance criteria: - No `supports_convert_to_state` API remains - All in-tree `GroupsAccumulator` implementations compile with an explicit `convert_to_state` - Skip-partial aggregation no longer disables itself due to missing accumulator support - FFI API changes are documented and tested - Upgrading guide explains the required change for external `GroupsAccumulator` implementors ### Describe alternatives you've considered We could keep `supports_convert_to_state` and continue enabling skip-partial aggregation only when all accumulators opt in. That preserves compatibility, but it keeps the hash aggregate code more complex and allows some accumulators to miss the high-cardinality fast path indefinitely. Another option is to rely on `GroupsAccumulatorAdapter` as a generic fallback for all missing implementations. This is correct, but it is row-by-row and may not be high performance. The desired end state should be that native/vectorized group accumulators implement efficient conversions where possible, while any fallback remains explicit. ### Additional context This is likely a breaking public API change for custom aggregate authors and FFI users, so it should be marked as an API change and documented in the upgrading guide. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
