adriangb opened a new pull request, #22887: URL: https://github.com/apache/datafusion/pull/22887
## Which issue does this PR close? - Closes #22886. ## Rationale for this change `SIMILAR TO` panics with `failed to downcast array` whenever its operands end up as arrays of different physical types — e.g. a `NULL` pattern (which even panics at plan time via constant folding) or a `Utf8View` input matched against a non-literal `Utf8` pattern. The root cause is twofold: 1. `Expr::SimilarTo` was listed in the "nothing to coerce" arm of the `TypeCoercion` analyzer, so its operands were never coerced to a common string type (unlike `LIKE` and the regex binary operators). 2. `regex_match_dyn` picks the downcast type from the left array and blind-`.expect()` downcasts the right array to the same type, panicking on any mismatch. ## What changes are included in this PR? - `TypeCoercionRewriter` now coerces `Expr::SimilarTo` operands to a common string type using `regex_coercion`, mirroring the existing `Expr::Like` handling (including the `Dictionary(_, Utf8)` left-hand-side exception that preserves the dictionary fast path). - The `regexp_is_match_flag!` / `regexp_is_match_flag_scalar!` kernels in `physical-expr` now return an internal error instead of panicking if a downcast fails (defense in depth). - The SQL planner's `SIMILAR TO` pattern type check now also accepts `LargeUtf8` and `Utf8View` patterns (previously only `Utf8` and `NULL`); the coercion added above makes these work. ## Are these changes tested? Yes: - New sqllogictest cases in `test_files/strings.slt` covering `NULL` patterns / inputs (now return `NULL` instead of panicking), and `Utf8View` / `LargeUtf8` / `Utf8` inputs matched against non-literal patterns of a different string type (previously panicked). - New `similar_to_for_type_coercion` unit test in `type_coercion.rs` following the existing `like_for_type_coercion` pattern, including the error case for non-string operands. ## Are there any user-facing changes? - Queries that previously panicked (`'a' SIMILAR TO NULL`, mixed string types with non-literal patterns) now return `NULL` / correct results. - Non-string `SIMILAR TO` operands now produce a planning error (`There isn't a common type to coerce ... in SIMILAR TO expression`) instead of a runtime panic. - `LargeUtf8` and `Utf8View` patterns are now accepted by the SQL planner. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
