adriangb opened a new pull request, #22887:
URL: https://github.com/apache/datafusion/pull/22887

   ## Which issue does this PR close?
   
   - Closes #22886.
   
   ## Rationale for this change
   
   `SIMILAR TO` panics with `failed to downcast array` whenever its operands 
end up as arrays of different physical types — e.g. a `NULL` pattern (which 
even panics at plan time via constant folding) or a `Utf8View` input matched 
against a non-literal `Utf8` pattern. The root cause is twofold:
   
   1. `Expr::SimilarTo` was listed in the "nothing to coerce" arm of the 
`TypeCoercion` analyzer, so its operands were never coerced to a common string 
type (unlike `LIKE` and the regex binary operators).
   2. `regex_match_dyn` picks the downcast type from the left array and 
blind-`.expect()` downcasts the right array to the same type, panicking on any 
mismatch.
   
   ## What changes are included in this PR?
   
   - `TypeCoercionRewriter` now coerces `Expr::SimilarTo` operands to a common 
string type using `regex_coercion`, mirroring the existing `Expr::Like` 
handling (including the `Dictionary(_, Utf8)` left-hand-side exception that 
preserves the dictionary fast path).
   - The `regexp_is_match_flag!` / `regexp_is_match_flag_scalar!` kernels in 
`physical-expr` now return an internal error instead of panicking if a downcast 
fails (defense in depth).
   - The SQL planner's `SIMILAR TO` pattern type check now also accepts 
`LargeUtf8` and `Utf8View` patterns (previously only `Utf8` and `NULL`); the 
coercion added above makes these work.
   
   ## Are these changes tested?
   
   Yes:
   
   - New sqllogictest cases in `test_files/strings.slt` covering `NULL` 
patterns / inputs (now return `NULL` instead of panicking), and `Utf8View` / 
`LargeUtf8` / `Utf8` inputs matched against non-literal patterns of a different 
string type (previously panicked).
   - New `similar_to_for_type_coercion` unit test in `type_coercion.rs` 
following the existing `like_for_type_coercion` pattern, including the error 
case for non-string operands.
   
   ## Are there any user-facing changes?
   
   - Queries that previously panicked (`'a' SIMILAR TO NULL`, mixed string 
types with non-literal patterns) now return `NULL` / correct results.
   - Non-string `SIMILAR TO` operands now produce a planning error (`There 
isn't a common type to coerce ... in SIMILAR TO expression`) instead of a 
runtime panic.
   - `LargeUtf8` and `Utf8View` patterns are now accepted by the SQL planner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to