findepi commented on issue #17138: URL: https://github.com/apache/datafusion/issues/17138#issuecomment-3178465418
The problem can be reproduced before https://github.com/apache/datafusion/commit/876b504cb0770debda687c91a9ba7295291d26e2 if the following rule https://github.com/apache/datafusion/blob/9463cf6792469534a32830aefbba32218716fb15/datafusion/optimizer/src/simplify_expressions/expr_simplifier.rs#L1067-L1074 gets duplicated to cover `Plus` operator too. Shorter repro ``` CREATE OR REPLACE TABLE tab1(col0 INTEGER, col1 INTEGER, col2 INTEGER); INSERT INTO tab1 VALUES(51,14,96); SELECT NULL * AVG(DISTINCT 4) + SUM(col1) AS col0 FROM tab1; ``` Plan before https://github.com/apache/datafusion/commit/876b504cb0770debda687c91a9ba7295291d26e2 has AVG eliminated, but SUM remains: ``` +---------------+-------------------------------------------------------------------------+ | plan_type | plan | +---------------+-------------------------------------------------------------------------+ | logical_plan | Projection: Float64(NULL) + CAST(sum(tab1.col1) AS Float64) AS col0 | | | Aggregate: groupBy=[[]], aggr=[[sum(CAST(tab1.col1 AS Int64))]] | | | TableScan: tab1 projection=[col1] | | physical_plan | ProjectionExec: expr=[NULL + CAST(sum(tab1.col1)@0 AS Float64) as col0] | | | AggregateExec: mode=Single, gby=[], aggr=[sum(tab1.col1)] | | | DataSourceExec: partitions=1, partition_sizes=[3] | | | | +---------------+-------------------------------------------------------------------------+ ``` On `876b504cb0770debda687c91a9ba7295291d26e2^` with added rule for `Plus` the SUM gets removed, but AVG remains ``` +---------------+----------------------------------------------------------------------------------------+ | plan_type | plan | +---------------+----------------------------------------------------------------------------------------+ | logical_plan | Projection: Float64(NULL) AS col0 | | | Aggregate: groupBy=[[]], aggr=[[avg(DISTINCT Float64(4)) AS avg(DISTINCT Int64(4))]] | | | TableScan: tab1 projection=[] | | physical_plan | ProjectionExec: expr=[NULL as col0] | | | AggregateExec: mode=Single, gby=[], aggr=[avg(DISTINCT Int64(4))] | | | DataSourceExec: partitions=1, partition_sizes=[3] | | | | +---------------+----------------------------------------------------------------------------------------+ ``` Maybe there is some optimizer rule pruning unused aggregations that retains at least one aggregate function. When both become redundant, it doesn't know which one to prune? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org