Sean-Kenneth-Doherty opened a new pull request, #22286:
URL: https://github.com/apache/datafusion/pull/22286
## Which issue does this PR close?
- Closes #22257.
## Rationale for this change
PostgreSQL treats an empty regular expression in `regexp_instr` as a
zero-width match. DataFusion previously special-cased empty patterns to return
`0`, so `regexp_instr('abc', '')` diverged from PostgreSQL.
## What changes are included in this PR?
- Handles empty `regexp_instr` patterns as zero-width matches at `start + N
- 1`, returning `0` when that position is past the end of the string.
- Keeps existing regex/flag validation for empty patterns.
- Lets start/N validation run before empty-string handling.
- Adds Rust unit coverage and sqllogictest coverage for empty-pattern
behavior.
## Are these changes tested?
- `cargo fmt --all`
- `cargo test -p datafusion-functions
regex::regexpinstr::tests::test_regexp_instr -- --nocapture`
- `cargo test --profile=ci --test sqllogictests -- regexp/regexp_instr.slt`
- `TMPDIR=/home/sean/Projects/datafusion-contrib/target/tmp cargo clippy
--all-targets --all-features -- -D warnings`
- `git diff --check`
## Are there any user-facing changes?
Yes. `regexp_instr` now matches PostgreSQL behavior for empty regular
expression patterns.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]