gabotechs opened a new issue, #22587:
URL: https://github.com/apache/datafusion/issues/22587

   ### Describe the bug
   
   Since PR #20426 ("Prefer numeric in type coercion for comparisons"), any 
query that compares a numeric column with a string literal will fail to plan if 
the string cannot be cast to that numeric type at planning time.
   
   The regression is visible with ClickBench Q36 (and similar queries Q37–Q42), 
where EventDate is stored as UInt16 (days since epoch) and compared with ISO 
date strings:
   
   ```sql
     SELECT "URL", COUNT(*) AS PageViews
     FROM hits
     WHERE "CounterID" = 62
       AND "EventDate" >= '2013-07-01'
       AND "EventDate" <= '2013-07-31'
       AND "DontCountHits" = 0
       AND "IsRefresh" = 0
       AND "URL" <> ''
     GROUP BY "URL"
     ORDER BY PageViews DESC
     LIMIT 10;
   ```
   
   ```
   Optimizer rule 'simplify_expressions' failed:  ArrowError(CastError("Cannot 
cast string '2013-07-01' to value of UInt16 type"))
   ```
   
   PR #20426 changed comparison_coercion(UInt16, Utf8) from returning Utf8 to 
returning UInt16 (numeric wins). This flips the shape of the predicate the 
optimizer receives:
   
   -- Before #20426 (coercion target = Utf8)
   FilterExec: CAST(EventDate AS Utf8) >= '2013-07-01'
   -- After #20426 (coercion target = UInt16)
   FilterExec: EventDate >= CAST('2013-07-01' AS UInt16)
     
   The simplify_expressions optimizer rule then sees CAST(Literal('2013-07-01') 
AS UInt16), an all-constant sub-expression, and tries to fold it at planning 
time. The Arrow cast kernel cannot parse the ISO date string '2013-07-01' as a 
UInt16 integer, so it errors, and planning fails.
   
   Before PR #20426 the literal was never cast (it was already the target Utf8 
type), so this code path was never reached.
   
   ### To Reproduce
   
   Added a reproducer here:
   - https://github.com/apache/datafusion/pull/22586
   
   Also, ClickBench queries from 36 to 42 fail out of the box. Today, they work 
well because they are being rewritten as views prior to execution 
(https://github.com/apache/datafusion/pull/21498), but IMO any kind of rewrite 
to a standardized benchmarks defeats the purpose of the benchmark being 
standard.
   
   ### Expected behavior
   
   The ClickBench queries work out of the box like in DataFusion 53.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to