alamb commented on issue #8778: URL: https://github.com/apache/arrow-datafusion/issues/8778#issuecomment-1881736195
> A optimizer should combine adjacent partial and final AggregateExecs, if possible. I agree it should combine the adjacent aggregators if possible. One reason it may not be combining them is to increase parallelism as the initial partial `AggregateExec` runs in parallel on the raw data stream and then the second final `AggregateExec` runs after the initial pass has run. This is partly illustrated here https://docs.rs/datafusion/latest/datafusion/logical_expr/trait.Accumulator.html#tymethod.state If you can share an` EXPLAIN PLAN` showing what you are seeing and we can perhaps help diagnose if there is an improvement to make in DataFusion or if something else is going on -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
