Sevenannn opened a new issue, #12980:
URL: https://github.com/apache/datafusion/issues/12980

   ### Describe the bug
   
   When the datafusion logical planner build the `AGGREGATE` plan, it adds 
additional columns in the group_expr based on the functional dependencies. 
However, for queries that are aggregating upon table obatined through `UNION` 
operation, the functional dependency is still preserved in the schema of UNION 
plan, while the functional dependency no longer retains after the UNION. This 
causes wrong column being added as group_by column in aggregation plan
   
   ### To Reproduce
   
   Query involves aggregation on UNION will cause the issue. For example, the 
query below:
   
   ```SQL
   with t1 as (
       select i_manufact_id, count(*) as extra from item
       group by i_manufact_id
   ),
   t2 as (
       select i_manufact_id, count(*) as extra from item
       group by i_manufact_id
   )
   select i_manufact_id, sum(extra)
    from  (select * from t1
           union all
           select * from t2) tmp1
    group by i_manufact_id
    order by i_manufact_id;
   ```
   
   This will lead to a logical plan that involves wrong extra column in 
Aggregate
   `Aggregate: groupBy=[[tmp1.i_manufact_id, tmp1.extra]], aggr=[[sum(tmp1. 
extra)]]       `
   
   ### Expected behavior
   
   UNION logical plan shouldn't retain functional dependencies from the tables 
involved in UNION. In the example below, both Table1 and Table2 has the 
functional dependency `col1 -> col2`. However, when `select * from table1 UNION 
select * from table2`, the functional dependency `col1 -> col2` no longer holds.
   Table 1:
   ```
   col1 | col2
   -----|-----
   a    | 1
   b    | 2
   ```
   Table 2:
   ```
   col1 | col2
   -----|-----
   a    | 2
   b    | 4
   ```
   
   
   ### Additional context
   
   This bug is causing wrong results for running TPCDS query 33, 56, 60, 66 - 
duplicated groups are presented in results


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to