adriangb opened a new issue, #22886:
URL: https://github.com/apache/datafusion/issues/22886
### Describe the bug
`SIMILAR TO` panics with `failed to downcast array` whenever the two
operands end up as arrays of different physical types. Unlike `LIKE` and the
regex binary operators (`~`, `~*`, ...), the `TypeCoercion` analyzer never
coerces `Expr::SimilarTo` operands to a common string type (it is listed in the
"nothing to coerce" arm of `TypeCoercionRewriter`), and the kernel that
ultimately executes it (`regex_match_dyn` in
`datafusion/physical-expr/src/expressions/binary/kernels.rs`) picks the
downcast type from the **left** array and blind-`.expect()` downcasts the right
array to the same type:
```rust
macro_rules! regexp_is_match_flag {
($LEFT:expr, $RIGHT:expr, $ARRAYTYPE:ident, $NOT:expr, $FLAG:expr) => {{
let ll = $LEFT.as_any().downcast_ref::<$ARRAYTYPE>().expect("failed
to downcast array");
let rr = $RIGHT.as_any().downcast_ref::<$ARRAYTYPE>().expect("failed
to downcast array");
...
```
Literal patterns take a scalar fast path (`regex_match_dyn_scalar`), which
is why the common `col SIMILAR TO 'pattern'` form works and this went
unnoticed. Any mismatch that reaches the array-array path panics, e.g.:
- a `NULL` pattern (right side is a `NullArray`) — this even panics at
*plan* time via constant folding (`ConstEvaluator`),
- a `Utf8View` column matched against a non-literal `Utf8` pattern (or any
other string-type combination).
This is likely the same root cause as #15461, which was closed as not
reproducible.
We hit this in production with `lower(col) SIMILAR TO '...'`-shaped queries
over `Utf8View` columns.
### To Reproduce
Reproduced on current `main` (3f52debc53, 2026-06-10) with `datafusion-cli`.
**1. NULL pattern (panics during planning, via constant folding):**
```sql
SELECT 'a' SIMILAR TO NULL;
```
```
thread 'main' panicked at
datafusion/physical-expr/src/expressions/binary/kernels.rs:284:13:
failed to downcast array
```
Backtrace excerpt:
```
5: datafusion_physical_expr::expressions::binary::kernels::regex_match_dyn
6:
datafusion_physical_expr::expressions::binary::BinaryExpr::evaluate_with_resolved_args
8:
datafusion_optimizer::simplify_expressions::expr_simplifier::ConstEvaluator::evaluate_to_scalar
```
**2. Utf8View vs non-literal Utf8 pattern (panics at runtime):**
```sql
CREATE TABLE t AS SELECT * FROM (VALUES ('user auth failed')) v(s);
CREATE TABLE p AS SELECT * FROM (VALUES ('(auth|login)')) v(pat);
SELECT arrow_cast(t.s, 'Utf8View') SIMILAR TO p.pat FROM t CROSS JOIN p;
```
```
thread 'main' panicked at
datafusion/physical-expr/src/expressions/binary/kernels.rs:287:13:
failed to downcast array
```
(The panic line depends on the left-hand type: `Utf8` → :284, `Utf8View` →
:287, `LargeUtf8` → :290.)
### Expected behavior
- `'a' SIMILAR TO NULL` should return `NULL` (as `LIKE`/`~` do), not panic.
- Mixed string types (`Utf8View` / `Utf8` / `LargeUtf8`) should be coerced
to a common type, as already happens for `LIKE` (`like_coercion`) and the regex
binary operators (`regex_coercion`).
- Even on an unexpected type mismatch, the kernel should return an internal
error instead of panicking.
### Additional context
Root cause is twofold:
1. `TypeCoercionRewriter`
(`datafusion/optimizer/src/analyzer/type_coercion.rs`) handles `Expr::Like`
explicitly but lists `Expr::SimilarTo(_)` in the no-op arm, so its operands are
never coerced.
2. `regex_match_dyn` / `regexp_is_match_flag!` panic instead of returning
`internal_err!` on downcast failure.
I have a fix (coerce `SimilarTo` operands with `regex_coercion`, mirroring
`Expr::Like`, plus harden the kernel downcasts) and will open a PR shortly.
Related: #15461
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]