Amogh-2404 opened a new pull request, #22497:
URL: https://github.com/apache/datafusion/pull/22497
## Which issue does this PR close?
- Closes #22253.
## Rationale for this change
PostgreSQL returns the input unchanged when `replace` is called with an
empty `from`. DataFusion was instead inserting `to` before every character and
at both ends, so `replace('abc', '', 'x')` returned `xaxbxcx`. This PR brings
the behaviour in line with PostgreSQL. Part of the PG-compatibility cleanup
tracked in #22247.
## What changes are included in this PR?
- `datafusion/functions/src/string/replace.rs`: the empty-`from` branch in
`apply_replace` now writes the input verbatim instead of inserting `to`. Added
a `LargeUtf8` unit test for the new behaviour.
- `datafusion/sqllogictest/test_files/string/string_literal.slt`: four new
SLT asserts covering the `Utf8`, `Dictionary`, `Utf8View`, and `LargeUtf8`
paths.
- `datafusion/sqllogictest/test_files/string/string_query.slt.part`: updated
four expected rows that were asserting the old buggy output.
## Are these changes tested?
Yes. The unit test in `replace.rs` covers the `LargeUtf8` path, and the four
new SLT asserts in `string_literal.slt` cover the remaining Arrow string
encodings end-to-end. The full SLT suite passes locally.
## Are there any user-facing changes?
Yes. `replace(str, '', x)` now returns `str` unchanged instead of inserting
`x` between every character. This matches PostgreSQL.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]