jayzhan211 commented on code in PR #7242:
URL: https://github.com/apache/arrow-datafusion/pull/7242#discussion_r1399365249
##########
datafusion/physical-expr/src/aggregate/sum.rs:
##########
@@ -101,6 +106,43 @@ impl AggregateExpr for Sum {
}
fn create_accumulator(&self) -> Result<Box<dyn Accumulator>> {
+ if let DataType::List(field) = &self.data_type {
Review Comment:
@alamb I agree that we should not introduce any downgrade especially for
this widely used function.
I think we can either run the comparison that ensure there is no performance
downgrade for `sum` or we just consider the initial approach that we build
`array_aggreate` like other array function.
In current approach, ArraySum is actually very different from Sum. We have
another Accumulator for Array version. I think this is the only place that
might effect the performance of current sum aggregate, where we need to
differentiate Array and non-Array cases.
Actually, the initial goal that we try to done this in Accumulator "To
reduce code duplication" is actually no longer true. Unless we have other
reason that we need ArraySum Accumulator. Maybe we should move this back to
`array_expression.rs`?
1. Is compare with the performance of Sum enough to ensure there should be
no downgrade?
2. Should we have ArraySum Accumulator for other reason or feature?
3. If we implement array aggregate function in `array_expression.rs` is
there any concern for the overall design?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]