ygf11 commented on issue #4837:
URL:
https://github.com/apache/arrow-datafusion/issues/4837#issuecomment-1375055044
I found the bug is in the `union type coercion`(#3513), and the bug still
exists.
We can reproduce it in the master branch:
```sql
❯ create table table_2(name text, id INT) as values('Alex',1);
0 rows in set. Query took 0.002 seconds.
❯ create table table_1(name text, id TINYINT) as values('Alex',1);
0 rows in set. Query took 0.002 seconds.
❯ (
SELECT * FROM table_1
EXCEPT
SELECT * FROM table_2
)
UNION ALL
(
SELECT * FROM table_2
EXCEPT
SELECT * FROM table_1
);
SchemaError(FieldNotFound { field: Column { relation: Some("table_2"), name:
"id" }, valid_fields: Some([Column { relation: Some("table_1"), name: "name" },
Column { relation: Some("table_1"), name: "id" }]) })
```
For union operation, we need ensure each data type of left and right should
be same.
It is done in:
https://github.com/apache/arrow-datafusion/blob/71b9baecd0a3c881f96e9994d922f3c1b3d61854/datafusion/expr/src/expr_rewriter.rs#L523-L527
But it uses `plan.expressions()` to get fields(schema) of the input, which I
think is not correct, because it maybe return other expressions, like `join`
will return its predicates not the output fields.
To fix this issue, I think we can abandon `plan.expressions()`, and use the
input schema to enumerate the output fields, finally create the new plan with
`Projection::try_new_with_schema`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]