goldmedal commented on issue #15383: URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2763293892
@Dandandan I have a draft https://github.com/goldmedal/datafusion/pull/3 based on #15423 for `HashAggregate`. Could you check if it's heading in the right direction? When the selection vector mode is enabled: - `CoalesceBatchesExec` is not added for `FinalPartitioned`. - The selection vector is used to filter the required rows before merging batches. The plan looks like this: ``` > create table t(c int) as values (1), (1), (1), (1), (2), (2), (3), (3) > explain select count(distinct c) from t; +---------------+--------------------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+--------------------------------------------------------------------------------------------------+ | logical_plan | Projection: count(alias1) AS count(DISTINCT t.c) | | | Aggregate: groupBy=[[]], aggr=[[count(alias1)]] | | | Aggregate: groupBy=[[t.c AS alias1]], aggr=[[]] | | | TableScan: t projection=[c] | | physical_plan | ProjectionExec: expr=[count(alias1)@0 as count(DISTINCT t.c)] | | | AggregateExec: mode=Final, gby=[], aggr=[count(alias1)] | | | CoalescePartitionsExec | | | AggregateExec: mode=Partial, gby=[], aggr=[count(alias1)] | | | AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[] | | | RepartitionExec: partitioning=HashSelectionVector([alias1@0], 12), input_partitions=12 | | | RepartitionExec: partitioning=RoundRobinBatch(12), input_partitions=1 | | | AggregateExec: mode=Partial, gby=[c@0 as alias1], aggr=[] | | | DataSourceExec: partitions=1, partition_sizes=[1] | | | | +---------------+--------------------------------------------------------------------------------------------------+ ``` I'll review more aggregation patterns and add additional tests. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org