mvanhorn opened a new pull request, #22992: URL: https://github.com/apache/datafusion/pull/22992
## Which issue does this PR close? - Closes #22886. ## Rationale for this change `SIMILAR TO` panics with `failed to downcast array` whenever its two operands resolve to arrays of different physical string types, for example a `Utf8View` column matched against a non-literal `Utf8` pattern, or a `NULL` pattern that produces a `NullArray`. A `NULL` pattern even panics at plan time during constant folding. The cause is that, unlike `LIKE`/`ILIKE` and the regex operators, the `TypeCoercion` analyzer never coerces `Expr::SimilarTo` operands to a common type. `Expr::SimilarTo(_)` was listed in the "nothing to coerce" no-op arm of `TypeCoercionRewriter::f_up`, so both operands reached the executing kernel unchanged. That kernel picks the downcast type from the left array and `.expect()`s the right array to match, which panics when they differ. Literal patterns take the scalar fast path, which is why the common `col SIMILAR TO 'pattern'` form works and this went unnoticed. ## What changes are included in this PR? - Removed `Expr::SimilarTo(_)` from the no-op coercion arm. - Added a dedicated `Expr::SimilarTo` coercion arm in `datafusion/optimizer/src/analyzer/type_coercion.rs` that mirrors the existing `Expr::Like` arm: it computes the operand types, finds the common string type via `like_coercion`, and casts both operands to it (preserving the same `Dictionary(_, Utf8)` short-circuit the `Like` arm uses). This guarantees both operands reach the regex kernel as the same physical type and coerces `NULL` patterns into the common type, fixing both the execution-time and plan-time panics. The no-common-type error message uses the `SIMILAR TO` operator name. ## Are these changes tested? Yes. - A new unit test `similar_to_for_type_coercion` in `type_coercion.rs` (next to `like_for_type_coercion`) covers the literal-pattern, `NULL`-pattern, and no-common-type-error cases. - New end-to-end coverage in `datafusion/sqllogictest/test_files/type_coercion.slt` exercises a `Utf8View` column matched against a non-literal `Utf8` pattern column and a `NULL` pattern, both of which previously panicked and now return correct results. ## Are there any user-facing changes? `SIMILAR TO` queries that previously panicked now plan and execute correctly. There are no API changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
