Shuo-O opened a new pull request, #51462:
URL: https://github.com/apache/doris/pull/51462
### What problem does this PR solve?
**Issue Number:** close #51351
**Related PR:** #
Current Doris lacks a function to return the 1-based index of a regex match.
Users must use UDFs or application logic to find a match position. This PR
adds:regexp_position(str, pattern[, start])
- `regexp_position(str, pattern)`: returns first match position or -1.
- `regexp_position(str, pattern, start)`: parses `start` as a 1-based
integer; if invalid or <1, returns -1; otherwise searches from `start_pos`.
- BE uses RE2 with thread-local pattern caching; `start` parsed via
`std::stoll`.
- FE registers two overloads in Nereids and implements constant folding.
- BE-UT and regression tests cover empty strings, NULL, invalid/
out-of-range `start`, Unicode, word-boundary, etc.
---
### Release note
Add `regexp_position(str, pattern[, start])` to return the 1-based index of
a regex match (or -1 if none). Supports an optional `start`. Includes BE
implementation, FE registration, constant folding, unit tests, and regression
tests.
---
### Check List (For Author)
- Test
- [x] Regression test (`test_string_function_regexp.groovy` / `.out`)
- [x] Unit Test (`function_regexp_position_test.cpp`)
- [x] Manual test (SQL client queries & FE constant folding)
- Behavior changed:
- [x] No
- Need documentation?
- [x] Yes: https://github.com/apache/doris-website/pull/2071
---
### Check List (For Reviewer)
- [ ] Confirm release note
- [ ] Confirm test cases
- [ ] Confirm documentation
- [ ] Add branch-pick label if needed
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]