jayzhan211 opened a new issue, #15850: URL: https://github.com/apache/datafusion/issues/15850
### Is your feature request related to a problem or challenge? ``` statement count 0 create table t(a int) as values (1), (2); query I select count(distinct a) from t; ---- 2 query TT explain select count(distinct a) from t; ---- logical_plan 01)Projection: count(alias1) AS count(DISTINCT t.a) 02)--Aggregate: groupBy=[[]], aggr=[[count(alias1)]] 03)----Aggregate: groupBy=[[t.a AS alias1]], aggr=[[]] 04)------TableScan: t projection=[a] physical_plan 01)ProjectionExec: expr=[count(alias1)@0 as count(DISTINCT t.a)] 02)--AggregateExec: mode=Final, gby=[], aggr=[count(alias1)] 03)----CoalescePartitionsExec 04)------AggregateExec: mode=Partial, gby=[], aggr=[count(alias1)] 05)--------AggregateExec: mode=FinalPartitioned, gby=[alias1@0 as alias1], aggr=[] 06)----------CoalesceBatchesExec: target_batch_size=8192 07)------------RepartitionExec: partitioning=Hash([alias1@0], 4), input_partitions=4 08)--------------RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 09)----------------AggregateExec: mode=Partial, gby=[a@0 as alias1], aggr=[] 10)------------------DataSourceExec: partitions=1, partition_sizes=[1] ``` I think we should execute with the specialized count distinct accumualator like `PrimitiveDistinctCountAccumulator`, `BytesDistinctCountAccumulator`, `FloatDistinctCountAccumulator`. Current execution path looks quite complex and probably not that optimized. ### Describe the solution you'd like Investigate why distinct count accumulator is not called and whether switching to it improves the code. ClickBench has query like count(distinct), so we could benchmark against it to see if the improvement works ### Describe alternatives you've considered _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org