andygrove opened a new issue, #4497:
URL: https://github.com/apache/datafusion-comet/issues/4497
## Describe the bug
`replace(str, search, replacement)` diverges from Spark when `search` is the
empty literal string. Spark returns `str` unchanged (short-circuit on
`search.numBytes == 0` in `StringReplace.eval`). Comet delegates to
DataFusion's `replace`, which inserts `replacement` between every character and
at both boundaries.
## How to Reproduce
```sql
SELECT replace('hello', '', 'x');
```
| Engine | Result |
| --- | --- |
| Spark | `hello` |
| Comet (DataFusion) | `xhxexlxlxox` |
## Additional context
Surfaced by the string-expressions audit (#4461) follow-up. Issue #3344
covered the same divergence but the body had the expected/actual values
swapped, leading to it being closed as already-fixed. This issue restates the
divergence with the correct direction.
Workaround: the `audit-comet-expression` follow-up marks `replace` as
`Incompatible(Some(reason))` only when `searchExpr` is a literal empty string,
so the dispatcher falls back to Spark for that specific case unless the user
opts in with `spark.comet.expression.StringReplace.allowIncompatible=true`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]