xudong963 opened a new issue #1818:
URL: https://github.com/apache/arrow-datafusion/issues/1818


   **Describe the bug**
   When union some `select constant1, contant2, ...` similar queries, union 
can't produce the result.
   
   **To Reproduce**
   ```sql
   ❯ SELECT 1 as col1, 'one' as col2
   UNION ALL
   SELECT 2, 'two'
   UNION ALL
   SELECT 3, 'three';
   
   Plan("UNION ALL schemas are expected to be the same")
   
   ❯ SELECT 1 , 'one'
   UNION ALL
   SELECT 2, 'two'
   UNION ALL
   SELECT 3, 'three';
   
   Plan("UNION ALL schemas are expected to be the same")
   ```
   
   **Expected behavior**
   Let' see results from postgres:
   ```sql
   postgres=# SELECT 1 AS column1, 'one' AS column2
   UNION ALL
   SELECT 2, 'two'
   UNION ALL
   SELECT 3, 'three';
    column1 | column2
   ---------+---------
          1 | one
          2 | two
          3 | three
   (3 rows)
   
   postgres=# SELECT 1 , 'one'
   UNION ALL
   SELECT 2, 'two'
   UNION ALL
   SELECT 3, 'three';
    ?column? | ?column?
   ----------+----------
           1 | one
           2 | two
           3 | three
   (3 rows)
   ```
   
   **Additional context**
   I have done an easy analysis, the followings are my thought to solve the 
issue:
   1. We should fix the default name when the expr in projection doesn't have a 
name. Let's see an example. `select a from table1`, `a` is the column from 
`table1`, so it naturally has a name `a`, but for `select 1`, `1` isn't a 
column, it's a `Value` expr, so its name will be `Int64(1)` in current 
datatfusion's implementation, the related code is 
https://github.com/apache/arrow-datafusion/blob/7be0e268a69ffecbf06823c98ca572733dddb29e/datafusion/src/physical_plan/planner.rs#L91.
 So the first thing to solve the issue is to rethink how to create default 
name. I think we can keep consistent with postgres, such as `?column?`.
   2. Currently, DF judges if the logical plans in `union` have the same schema 
by arrow schema's field name 
https://github.com/apache/arrow-datafusion/blob/7be0e268a69ffecbf06823c98ca572733dddb29e/datafusion-common/src/dfschema.rs#L251.
 After finishing 1, we can add a condition, if the col name is `?column?`, we 
decide if logical plans have the same schema by arrow schema's filed datatype.
   
   
   cc @alamb @houqp 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to