Re: [PR] refactor: remove `opt_filter` in `GroupsAccumulator::merge_batch` [datafusion]

via GitHub Mon, 08 Jun 2026 07:00:44 -0700


haohuaijin commented on code in PR #22816:
URL: https://github.com/apache/datafusion/pull/22816#discussion_r3373714043



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1230,11 +1232,7 @@ impl GroupedHashAggregateStream {
                 .collect::<Result<Vec<_>>>()?;
             let false_filter = BooleanArray::from(vec![false]);
             for (acc, args) in 
self.accumulators.iter_mut().zip(null_args.iter()) {
-                if self.mode.input_mode() == AggregateInputMode::Raw {
-                    acc.update_batch(args, &[0], Some(&false_filter), 
total_groups)?;
-                } else {
-                    acc.merge_batch(args, &[0], Some(&false_filter), 
total_groups)?;
-                }

Review Comment:
   Note that `aggregate_arguments` depends on the aggregate mode. In Raw mode 
it has raw input arguments, which fit `update_batch`. In Partial input modes it 
has state fields, which normally fit `merge_batch`. If this path is ever 
reached in a Partial input mode, `update_batch` could receive the wrong shape 
of arrays. For example, `AVG` takes one raw input array in `update_batch`, but 
two state arrays in `merge_batch`.
   
   I could not reproduce this through normal SQL/sqllogictest. The usual SQL 
plan initializes empty grouping sets in the Partial stage, so the Final stage 
gets a non-empty state row and does not hit this path. see the test case in 
[dcc9c27](https://github.com/apache/datafusion/pull/22816/commits/dcc9c275c669c4b7290de3b940ae957047238c88)
   
   cc @2010YOUY01 @alamb 
   
   this is only place that use the `merge_batch` with not None 
`opt_filter`(added recently).



##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1230,11 +1232,7 @@ impl GroupedHashAggregateStream {
                 .collect::<Result<Vec<_>>>()?;
             let false_filter = BooleanArray::from(vec![false]);
             for (acc, args) in 
self.accumulators.iter_mut().zip(null_args.iter()) {
-                if self.mode.input_mode() == AggregateInputMode::Raw {
-                    acc.update_batch(args, &[0], Some(&false_filter), 
total_groups)?;
-                } else {
-                    acc.merge_batch(args, &[0], Some(&false_filter), 
total_groups)?;
-                }

Review Comment:
   Note that `aggregate_arguments` depends on the aggregate mode. In Raw mode 
it has raw input arguments, which fit `update_batch`. In Partial input modes it 
has state fields, which normally fit `merge_batch`. If this path is ever 
reached in a Partial input mode, `update_batch` could receive the wrong shape 
of arrays. For example, `AVG` takes one raw input array in `update_batch`, but 
two state arrays in `merge_batch`.
   
   I could not reproduce this through normal SQL/sqllogictest. The usual SQL 
plan initializes empty grouping sets in the Partial stage, so the Final stage 
gets a non-empty state row and does not hit this path. see the test case added 
in 
[dcc9c27](https://github.com/apache/datafusion/pull/22816/commits/dcc9c275c669c4b7290de3b940ae957047238c88)
   
   cc @2010YOUY01 @alamb 
   
   this is only place that use the `merge_batch` with not None 
`opt_filter`(added recently).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] refactor: remove `opt_filter` in `GroupsAccumulator::merge_batch` [datafusion]

Reply via email to