haohuaijin opened a new pull request, #22816: URL: https://github.com/apache/datafusion/pull/22816
## Which issue does this PR close? - Closes #22775. ## Rationale for this change the `opt_filter` on `GroupsAccumulator::merge_batch` is a dead parameter. Aggregate `FILTER` clauses only apply to raw input rows in the update phase (`update_batch`). `merge_batch` combines already pre-aggregated states, so there is no per-row filtering to do — `opt_filter` is meaningless there. The code confirms this: - The only production caller (`row_hash.rs`) always passed `None`. - Existing implementations already ignored it — e.g. `correlation.rs` asserted `opt_filter.is_none()`, and Spark `avg` used `_opt_filter`. ## What changes are included in this PR? - Removed `opt_filter` from `merge_batch` in the trait and all implementations (built-in aggregates, `physical-expr-common`, `functions-aggregate-common`, Spark, and FFI). - Updated the trait docs to say `merge_batch` has no `opt_filter` because filtering happens in the update phase. - Changed the group zero-init path in `row_hash.rs` to always use `update_batch` with an all-false filter instead of branching to `merge_batch`. `update_batch` always takes raw argument types (what `aggregate_arguments` provides), and since every row is filtered out the data never matters — this is simpler and more correct. - Updated all call sites and tests. ## Are these changes tested? Yes. Existing aggregate tests cover this and were updated to the new signature. The `first_last` tests were adjusted (with comments) to match the merge behavior without a filter, and the FFI and Spark tests were updated too. ## Are there any user-facing changes? Yes — this is a breaking change to the public `GroupsAccumulator` trait: `opt_filter` is removed from `merge_batch`. Custom implementations and direct callers must update their signatures. Please add the `api change` label. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
