findepi commented on issue #17138:
URL: https://github.com/apache/datafusion/issues/17138#issuecomment-3178465418

   The problem can be reproduced before 
https://github.com/apache/datafusion/commit/876b504cb0770debda687c91a9ba7295291d26e2
 if the following rule
   
https://github.com/apache/datafusion/blob/9463cf6792469534a32830aefbba32218716fb15/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1067-L1074
   
   gets duplicated to cover `Plus` operator too.
   Shorter repro
   
   ```
   CREATE OR REPLACE TABLE tab1(col0 INTEGER, col1 INTEGER, col2 INTEGER);
   INSERT INTO tab1 VALUES(51,14,96);
   SELECT NULL * AVG(DISTINCT 4) + SUM(col1) AS col0 FROM tab1;
   ```
   
   Plan before 
https://github.com/apache/datafusion/commit/876b504cb0770debda687c91a9ba7295291d26e2
 has AVG eliminated, but SUM remains:
   
   ```
   
+---------------+-------------------------------------------------------------------------+
   | plan_type     | plan                                                       
             |
   
+---------------+-------------------------------------------------------------------------+
   | logical_plan  | Projection: Float64(NULL) + CAST(sum(tab1.col1) AS 
Float64) AS col0     |
   |               |   Aggregate: groupBy=[[]], aggr=[[sum(CAST(tab1.col1 AS 
Int64))]]       |
   |               |     TableScan: tab1 projection=[col1]                      
             |
   | physical_plan | ProjectionExec: expr=[NULL + CAST(sum(tab1.col1)@0 AS 
Float64) as col0] |
   |               |   AggregateExec: mode=Single, gby=[], 
aggr=[sum(tab1.col1)]             |
   |               |     DataSourceExec: partitions=1, partition_sizes=[3]      
             |
   |               |                                                            
             |
   
+---------------+-------------------------------------------------------------------------+
   ```
   
   
   
   On `876b504cb0770debda687c91a9ba7295291d26e2^` with added rule for `Plus` 
the SUM gets removed, but AVG remains
   
   
   ```
   
+---------------+----------------------------------------------------------------------------------------+
   | plan_type     | plan                                                       
                            |
   
+---------------+----------------------------------------------------------------------------------------+
   | logical_plan  | Projection: Float64(NULL) AS col0                          
                            |
   |               |   Aggregate: groupBy=[[]], aggr=[[avg(DISTINCT Float64(4)) 
AS avg(DISTINCT Int64(4))]] |
   |               |     TableScan: tab1 projection=[]                          
                            |
   | physical_plan | ProjectionExec: expr=[NULL as col0]                        
                            |
   |               |   AggregateExec: mode=Single, gby=[], aggr=[avg(DISTINCT 
Int64(4))]                    |
   |               |     DataSourceExec: partitions=1, partition_sizes=[3]      
                            |
   |               |                                                            
                            |
   
+---------------+----------------------------------------------------------------------------------------+
   ```
   
   Maybe there is some optimizer rule pruning unused aggregations that retains 
at least one aggregate function. When both become redundant, it doesn't know 
which one to prune?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to