goldmedal commented on issue #15383:
URL: https://github.com/apache/datafusion/issues/15383#issuecomment-2763293892

   @Dandandan 
   I have a draft https://github.com/goldmedal/datafusion/pull/3 based on 
#15423 for `HashAggregate`. Could you check if it's heading in the right 
direction?  
   
   When the selection vector mode is enabled:  
   - `CoalesceBatchesExec` is not added for `FinalPartitioned`.  
   - The selection vector is used to filter the required rows before merging 
batches.  
   
   The plan looks like this:
   ```
   > create table t(c int) as values (1), (1), (1), (1), (2), (2), (3), (3)
   > explain select count(distinct c) from t;
   
+---------------+--------------------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                                      |
   
+---------------+--------------------------------------------------------------------------------------------------+
   | logical_plan  | Projection: count(alias1) AS count(DISTINCT t.c)           
                                      |
   |               |   Aggregate: groupBy=[[]], aggr=[[count(alias1)]]          
                                      |
   |               |     Aggregate: groupBy=[[t.c AS alias1]], aggr=[[]]        
                                      |
   |               |       TableScan: t projection=[c]                          
                                      |
   | physical_plan | ProjectionExec: expr=[count(alias1)@0 as count(DISTINCT 
t.c)]                                    |
   |               |   AggregateExec: mode=Final, gby=[], aggr=[count(alias1)]  
                                      |
   |               |     CoalescePartitionsExec                                 
                                      |
   |               |       AggregateExec: mode=Partial, gby=[], 
aggr=[count(alias1)]                                  |
   |               |         AggregateExec: mode=FinalPartitioned, 
gby=[alias1@0 as alias1], aggr=[]                  |
   |               |           RepartitionExec: 
partitioning=HashSelectionVector([alias1@0], 12), input_partitions=12 |
   |               |             RepartitionExec: 
partitioning=RoundRobinBatch(12), input_partitions=1                |
   |               |               AggregateExec: mode=Partial, gby=[c@0 as 
alias1], aggr=[]                          |
   |               |                 DataSourceExec: partitions=1, 
partition_sizes=[1]                                |
   |               |                                                            
                                      |
   
+---------------+--------------------------------------------------------------------------------------------------+
   ```
   
   I'll review more aggregation patterns and add additional tests.
   Thanks.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to