gabotechs opened a new issue, #22587:
URL: https://github.com/apache/datafusion/issues/22587
### Describe the bug
Since PR #20426 ("Prefer numeric in type coercion for comparisons"), any
query that compares a numeric column with a string literal will fail to plan if
the string cannot be cast to that numeric type at planning time.
The regression is visible with ClickBench Q36 (and similar queries Q37–Q42),
where EventDate is stored as UInt16 (days since epoch) and compared with ISO
date strings:
```sql
SELECT "URL", COUNT(*) AS PageViews
FROM hits
WHERE "CounterID" = 62
AND "EventDate" >= '2013-07-01'
AND "EventDate" <= '2013-07-31'
AND "DontCountHits" = 0
AND "IsRefresh" = 0
AND "URL" <> ''
GROUP BY "URL"
ORDER BY PageViews DESC
LIMIT 10;
```
```
Optimizer rule 'simplify_expressions' failed: ArrowError(CastError("Cannot
cast string '2013-07-01' to value of UInt16 type"))
```
PR #20426 changed comparison_coercion(UInt16, Utf8) from returning Utf8 to
returning UInt16 (numeric wins). This flips the shape of the predicate the
optimizer receives:
-- Before #20426 (coercion target = Utf8)
FilterExec: CAST(EventDate AS Utf8) >= '2013-07-01'
-- After #20426 (coercion target = UInt16)
FilterExec: EventDate >= CAST('2013-07-01' AS UInt16)
The simplify_expressions optimizer rule then sees CAST(Literal('2013-07-01')
AS UInt16), an all-constant sub-expression, and tries to fold it at planning
time. The Arrow cast kernel cannot parse the ISO date string '2013-07-01' as a
UInt16 integer, so it errors, and planning fails.
Before PR #20426 the literal was never cast (it was already the target Utf8
type), so this code path was never reached.
### To Reproduce
Added a reproducer here:
- https://github.com/apache/datafusion/pull/22586
Also, ClickBench queries from 36 to 42 fail out of the box. Today, they work
well because they are being rewritten as views prior to execution
(https://github.com/apache/datafusion/pull/21498), but IMO any kind of rewrite
to a standardized benchmarks defeats the purpose of the benchmark being
standard.
### Expected behavior
The ClickBench queries work out of the box like in DataFusion 53.
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]