haohuaijin opened a new pull request, #22816:
URL: https://github.com/apache/datafusion/pull/22816

   ## Which issue does this PR close?
   
   - Closes #22775.
   
   ## Rationale for this change
   
   the `opt_filter` on `GroupsAccumulator::merge_batch` is a dead parameter. 
Aggregate `FILTER` clauses only apply to raw input rows in the update phase 
(`update_batch`). `merge_batch` combines already pre-aggregated states, so 
there is no per-row filtering to do — `opt_filter` is meaningless there.
   
   The code confirms this:
   - The only production caller (`row_hash.rs`) always passed `None`.
   - Existing implementations already ignored it — e.g. `correlation.rs` 
asserted `opt_filter.is_none()`, and Spark `avg` used `_opt_filter`.
   
   ## What changes are included in this PR?
   
   - Removed `opt_filter` from `merge_batch` in the trait and all 
implementations (built-in aggregates, `physical-expr-common`, 
`functions-aggregate-common`, Spark, and FFI).
   - Updated the trait docs to say `merge_batch` has no `opt_filter` because 
filtering happens in the update phase.
   - Changed the group zero-init path in `row_hash.rs` to always use 
`update_batch` with an all-false filter instead of branching to `merge_batch`. 
`update_batch` always takes raw argument types (what `aggregate_arguments` 
provides), and since every row is filtered out the data never matters — this is 
simpler and more correct.
   - Updated all call sites and tests.
   
   ## Are these changes tested?
   
   Yes. Existing aggregate tests cover this and were updated to the new 
signature. The `first_last` tests were adjusted (with comments) to match the 
merge behavior without a filter, and the FFI and Spark tests were updated too.
   
   ## Are there any user-facing changes?
   
   Yes — this is a breaking change to the public `GroupsAccumulator` trait: 
`opt_filter` is removed from `merge_batch`. Custom implementations and direct 
callers must update their signatures. Please add the `api change` label.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to