comphead commented on issue #5695:
URL: 
https://github.com/apache/arrow-datafusion/issues/5695#issuecomment-1483550835

   you are right, col names from first union all branch are the driving
   
   This case is not correct, col names has to be `count, n_regionkey`
   ```
   ❯ WITH w1 AS (select 1 as x , max(10) as y), w2 AS (select 5 as n_regionkey)
   select count(*) count, n_regionkey from w2 group by n_regionkey
   union all
   select x, y from w1 order by n_regionkey, count desc;
   +-------+----+
   | count | y  |
   +-------+----+
   | 1     | 5  |
   | 1     | 10 |
   +-------+----+
   ```
   If I remove order by I'm getting even more surprising
   ```
   ❯ WITH w1 AS (select 1 as x , max(10) as y), w2 AS (select 5 as n_regionkey)
   select count(*) count, n_regionkey from w2 group by n_regionkey
   union all
   select x, y from w1;
   +---+----+
   | x | y  |
   +---+----+
   | 1 | 10 |
   | 1 | 5  |
   +---+----+
   ```
   
   The bug partially related to wrong col name derivation in UNION ALL
   
   
   ```
   ❯ select  1 a, 2 b union all select 3 c, 4 d
   ;
   +---+---+
   | c | d |
   +---+---+
   | 3 | 4 |
   | 1 | 2 |
   +---+---+
   ```
   
   I will prepare a fix for UNION ALL first and then test out other scenarios, 
like not deterministic column naming with and without ORDER BY
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to