mvanhorn opened a new pull request, #22992:
URL: https://github.com/apache/datafusion/pull/22992

   ## Which issue does this PR close?
   
   - Closes #22886.
   
   ## Rationale for this change
   
   `SIMILAR TO` panics with `failed to downcast array` whenever its two 
operands resolve to arrays of different physical string types, for example a 
`Utf8View` column matched against a non-literal `Utf8` pattern, or a `NULL` 
pattern that produces a `NullArray`. A `NULL` pattern even panics at plan time 
during constant folding.
   
   The cause is that, unlike `LIKE`/`ILIKE` and the regex operators, the 
`TypeCoercion` analyzer never coerces `Expr::SimilarTo` operands to a common 
type. `Expr::SimilarTo(_)` was listed in the "nothing to coerce" no-op arm of 
`TypeCoercionRewriter::f_up`, so both operands reached the executing kernel 
unchanged. That kernel picks the downcast type from the left array and 
`.expect()`s the right array to match, which panics when they differ. Literal 
patterns take the scalar fast path, which is why the common `col SIMILAR TO 
'pattern'` form works and this went unnoticed.
   
   ## What changes are included in this PR?
   
   - Removed `Expr::SimilarTo(_)` from the no-op coercion arm.
   - Added a dedicated `Expr::SimilarTo` coercion arm in 
`datafusion/optimizer/src/analyzer/type_coercion.rs` that mirrors the existing 
`Expr::Like` arm: it computes the operand types, finds the common string type 
via `like_coercion`, and casts both operands to it (preserving the same 
`Dictionary(_, Utf8)` short-circuit the `Like` arm uses). This guarantees both 
operands reach the regex kernel as the same physical type and coerces `NULL` 
patterns into the common type, fixing both the execution-time and plan-time 
panics. The no-common-type error message uses the `SIMILAR TO` operator name.
   
   ## Are these changes tested?
   
   Yes.
   
   - A new unit test `similar_to_for_type_coercion` in `type_coercion.rs` (next 
to `like_for_type_coercion`) covers the literal-pattern, `NULL`-pattern, and 
no-common-type-error cases.
   - New end-to-end coverage in 
`datafusion/sqllogictest/test_files/type_coercion.slt` exercises a `Utf8View` 
column matched against a non-literal `Utf8` pattern column and a `NULL` 
pattern, both of which previously panicked and now return correct results.
   
   ## Are there any user-facing changes?
   
   `SIMILAR TO` queries that previously panicked now plan and execute 
correctly. There are no API changes.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to