haohuaijin commented on code in PR #22816:
URL: https://github.com/apache/datafusion/pull/22816#discussion_r3373714043
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1230,11 +1232,7 @@ impl GroupedHashAggregateStream {
.collect::<Result<Vec<_>>>()?;
let false_filter = BooleanArray::from(vec![false]);
for (acc, args) in
self.accumulators.iter_mut().zip(null_args.iter()) {
- if self.mode.input_mode() == AggregateInputMode::Raw {
- acc.update_batch(args, &[0], Some(&false_filter),
total_groups)?;
- } else {
- acc.merge_batch(args, &[0], Some(&false_filter),
total_groups)?;
- }
Review Comment:
Note that `aggregate_arguments` depends on the aggregate mode. In Raw mode
it has raw input arguments, which fit `update_batch`. In Partial input modes it
has state fields, which normally fit `merge_batch`. If this path is ever
reached in a Partial input mode, `update_batch` could receive the wrong shape
of arrays. For example, `AVG` takes one raw input array in `update_batch`, but
two state arrays in `merge_batch`.
I could not reproduce this through normal SQL/sqllogictest. The usual SQL
plan initializes empty grouping sets in the Partial stage, so the Final stage
gets a non-empty state row and does not hit this path. see the test case in
[dcc9c27](https://github.com/apache/datafusion/pull/22816/commits/dcc9c275c669c4b7290de3b940ae957047238c88)
cc @2010YOUY01 @alamb
this is only place that use the `merge_batch` with not None
`opt_filter`(added recently).
##########
datafusion/physical-plan/src/aggregates/row_hash.rs:
##########
@@ -1230,11 +1232,7 @@ impl GroupedHashAggregateStream {
.collect::<Result<Vec<_>>>()?;
let false_filter = BooleanArray::from(vec![false]);
for (acc, args) in
self.accumulators.iter_mut().zip(null_args.iter()) {
- if self.mode.input_mode() == AggregateInputMode::Raw {
- acc.update_batch(args, &[0], Some(&false_filter),
total_groups)?;
- } else {
- acc.merge_batch(args, &[0], Some(&false_filter),
total_groups)?;
- }
Review Comment:
Note that `aggregate_arguments` depends on the aggregate mode. In Raw mode
it has raw input arguments, which fit `update_batch`. In Partial input modes it
has state fields, which normally fit `merge_batch`. If this path is ever
reached in a Partial input mode, `update_batch` could receive the wrong shape
of arrays. For example, `AVG` takes one raw input array in `update_batch`, but
two state arrays in `merge_batch`.
I could not reproduce this through normal SQL/sqllogictest. The usual SQL
plan initializes empty grouping sets in the Partial stage, so the Final stage
gets a non-empty state row and does not hit this path. see the test case added
in
[dcc9c27](https://github.com/apache/datafusion/pull/22816/commits/dcc9c275c669c4b7290de3b940ae957047238c88)
cc @2010YOUY01 @alamb
this is only place that use the `merge_batch` with not None
`opt_filter`(added recently).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]