Dandandan opened a new issue, #6892: URL: https://github.com/apache/arrow-datafusion/issues/6892
### Is your feature request related to a problem or challenge? Currently, we often use two modes in aggregations:`Partial` + `FinalPartitioned`. `FinalPartitioned` requires the input to be hash-repartioned, so a `RepartionExec` is added in between. In certain cases, like when only aggregating on a single column, it is faster to skip the `Partial` aggregation and directly perform the aggregation on hash-partitioned input, as doing the `Partial` + `RepartionExec` + `FinalPartitioned` will be more work than doing the aggregation in one step (`RepartionExec` + `Partitioned`). Reasoning: RepartionExec (hash) is itself faster than `Partial` (although it doesn't reduce the output) and is necessary for `FinalPartitioned`. ### Describe the solution you'd like 1. Introduce `Partitioned` aggregation mode that requires input to be partitioned on the groupby-keys. 2. Utilize it in certain cases, like aggregations utilizing only one/few columns (consider queries such as`SELECT COUNT(DISTINCT a) FROM T`) ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
