waynexia opened a new issue, #9671:
URL: https://github.com/apache/arrow-datafusion/issues/9671
### Is your feature request related to a problem or challenge?
It's not always an expected behavior that casts a number to string in places
like `WHERE`. E.g.:
Create a table with an integer and a string columns.
```sql
create table t as (values (1,'1'), (2,'2'),(3,'3'));
```
Behavior for queries like the following is clear, as the provided constant
has the same type with column:
```sql
select * from t where column1 > 2;
select * from t where column2 > '2';
```
But when comparing `column1` against a string, like `'2'` or `'a'`,
`type_coercion` will add a cast to the column, resulting a plan like this:
```sql
❯ explain select * from t where column1 > '2';
+---------------+---------------------------------------------------+
| plan_type | plan |
+---------------+---------------------------------------------------+
| logical_plan | Filter: CAST(t.column1 AS Utf8) > Utf8("2") |
| | TableScan: t projection=[column1, column2] |
| physical_plan | CoalesceBatchesExec: target_batch_size=8192 |
| | FilterExec: CAST(column1@0 AS Utf8) > 2 |
| | MemoryExec: partitions=1, partition_sizes=[1] |
| | |
+---------------+---------------------------------------------------+
```
This behavior is a little confusing. It will not only slow the execution
(filter pushdown in parquet won't work), but the result will also be surprising
(`12` < `'2'` because we are comparing two strings).
### Describe the solution you'd like
Do not coerce everything to string when comparing another type with it. We
should be careful when coercion happens across two type families (like integer
to unsigned integer, or float to string etc.).
My proposal in this case is to consider the type of column first. E.g., try
to coerce the string literal into the type of column. And throw an error this
coercion fails. This can prevent us from executing an incorrect plan in
silence. As user can always add their own specific `CAST` if they want to do
string comparison.
This is also closer to PG's behavior:
```sql
postgres=# select 12 > '2';
?column?
----------
t
(1 row)
postgres=# select 12 > 'a';
ERROR: invalid input syntax for type integer: "a"
LINE 1: select 12 > 'a';
```
### Describe alternatives you've considered
MySQL would also try to cast string to integer. But it won't throw an error
when this cast fails. Instead, it would failover the string value to the
default number 0 which is more strange...
### Additional context
`String` is the "biggest" type in coercion, but in this case I realize it's
not always acceptable to cast something to string. I would start working on
`type_coercion` rule. I'm not sure if there are any other places that have a
similar issue.
And BTW this would be a breaking change.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]