liukun4515 commented on issue #4837:
URL: 
https://github.com/apache/arrow-datafusion/issues/4837#issuecomment-1377154862

   > I find the bug is in `union type coercion`(#3513), and the bug still 
exists. We can reproduce it in the master branch:
   > 
   > ```sql
   > ❯ create table table_2(name text, id INT) as  values('Alex',1);
   > 0 rows in set. Query took 0.002 seconds.
   > ❯ create table table_1(name text, id TINYINT) as  values('Alex',1);
   > 0 rows in set. Query took 0.002 seconds.
   > ❯ (
   >     SELECT * FROM table_1
   >     EXCEPT
   >     SELECT * FROM table_2
   > )
   > UNION ALL
   > (
   >     SELECT * FROM table_2
   >     EXCEPT
   >     SELECT * FROM table_1
   > );
   > SchemaError(FieldNotFound { field: Column { relation: Some("table_2"), 
name: "id" }, valid_fields: Some([Column { relation: Some("table_1"), name: 
"name" }, Column { relation: Some("table_1"), name: "id" }]) })
   > ```
   > 
   > For union operation, we need ensure each data type of left and right 
should be same. It is done in:
   > 
   > 
https://github.com/apache/arrow-datafusion/blob/71b9baecd0a3c881f96e9994d922f3c1b3d61854/datafusion/expr/src/expr_rewriter.rs#L523-L527
   > 
   > But it uses `plan.expressions()` to get the type coercion units of the 
input, which I think is not correct, because it maybe return other expressions, 
like `join` will return its predicates not the fields of the schema.
   
   > To fix this issue, I think we can abandon `plan.expressions()`, and use 
the input schema to enumerate the fields, finally create the new plan with 
`Projection::try_new_with_schema`.
   
   😭, I think this is the only solution to resolve union plan for now.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to