adriangb opened a new issue, #22886:
URL: https://github.com/apache/datafusion/issues/22886

   ### Describe the bug
   
   `SIMILAR TO` panics with `failed to downcast array` whenever the two 
operands end up as arrays of different physical types. Unlike `LIKE` and the 
regex binary operators (`~`, `~*`, ...), the `TypeCoercion` analyzer never 
coerces `Expr::SimilarTo` operands to a common string type (it is listed in the 
"nothing to coerce" arm of `TypeCoercionRewriter`), and the kernel that 
ultimately executes it (`regex_match_dyn` in 
`datafusion/physical-expr/src/expressions/binary/kernels.rs`) picks the 
downcast type from the **left** array and blind-`.expect()` downcasts the right 
array to the same type:
   
   ```rust
   macro_rules! regexp_is_match_flag {
       ($LEFT:expr, $RIGHT:expr, $ARRAYTYPE:ident, $NOT:expr, $FLAG:expr) => {{
           let ll = $LEFT.as_any().downcast_ref::<$ARRAYTYPE>().expect("failed 
to downcast array");
           let rr = $RIGHT.as_any().downcast_ref::<$ARRAYTYPE>().expect("failed 
to downcast array");
           ...
   ```
   
   Literal patterns take a scalar fast path (`regex_match_dyn_scalar`), which 
is why the common `col SIMILAR TO 'pattern'` form works and this went 
unnoticed. Any mismatch that reaches the array-array path panics, e.g.:
   
   - a `NULL` pattern (right side is a `NullArray`) — this even panics at 
*plan* time via constant folding (`ConstEvaluator`),
   - a `Utf8View` column matched against a non-literal `Utf8` pattern (or any 
other string-type combination).
   
   This is likely the same root cause as #15461, which was closed as not 
reproducible.
   
   We hit this in production with `lower(col) SIMILAR TO '...'`-shaped queries 
over `Utf8View` columns.
   
   ### To Reproduce
   
   Reproduced on current `main` (3f52debc53, 2026-06-10) with `datafusion-cli`.
   
   **1. NULL pattern (panics during planning, via constant folding):**
   
   ```sql
   SELECT 'a' SIMILAR TO NULL;
   ```
   
   ```
   thread 'main' panicked at 
datafusion/physical-expr/src/expressions/binary/kernels.rs:284:13:
   failed to downcast array
   ```
   
   Backtrace excerpt:
   
   ```
    5: datafusion_physical_expr::expressions::binary::kernels::regex_match_dyn
    6: 
datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_with_resolved_args
    8: 
datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar
   ```
   
   **2. Utf8View vs non-literal Utf8 pattern (panics at runtime):**
   
   ```sql
   CREATE TABLE t AS SELECT * FROM (VALUES ('user auth failed')) v(s);
   CREATE TABLE p AS SELECT * FROM (VALUES ('(auth|login)')) v(pat);
   SELECT arrow_cast(t.s, 'Utf8View') SIMILAR TO p.pat FROM t CROSS JOIN p;
   ```
   
   ```
   thread 'main' panicked at 
datafusion/physical-expr/src/expressions/binary/kernels.rs:287:13:
   failed to downcast array
   ```
   
   (The panic line depends on the left-hand type: `Utf8` → :284, `Utf8View` → 
:287, `LargeUtf8` → :290.)
   
   ### Expected behavior
   
   - `'a' SIMILAR TO NULL` should return `NULL` (as `LIKE`/`~` do), not panic.
   - Mixed string types (`Utf8View` / `Utf8` / `LargeUtf8`) should be coerced 
to a common type, as already happens for `LIKE` (`like_coercion`) and the regex 
binary operators (`regex_coercion`).
   - Even on an unexpected type mismatch, the kernel should return an internal 
error instead of panicking.
   
   ### Additional context
   
   Root cause is twofold:
   
   1. `TypeCoercionRewriter` 
(`datafusion/optimizer/src/analyzer/type_coercion.rs`) handles `Expr::Like` 
explicitly but lists `Expr::SimilarTo(_)` in the no-op arm, so its operands are 
never coerced.
   2. `regex_match_dyn` / `regexp_is_match_flag!` panic instead of returning 
`internal_err!` on downcast failure.
   
   I have a fix (coerce `SimilarTo` operands with `regex_coercion`, mirroring 
`Expr::Like`, plus harden the kernel downcasts) and will open a PR shortly.
   
   Related: #15461
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to