GaneshPatil7517 commented on code in PR #19827:
URL: https://github.com/apache/datafusion/pull/19827#discussion_r2703542503
##########
datafusion/functions/src/regex/regexpreplace.rs:
##########
@@ -189,13 +189,15 @@ fn regexp_replace_func(args: &[ColumnarValue]) ->
Result<ArrayRef> {
}
}
-/// replace POSIX capture groups (like \1) with Rust Regex group (like ${1})
+/// replace POSIX capture groups (like \1 or \\1) with Rust Regex group (like
${1})
/// used by regexp_replace
+/// Handles both single backslash (\1) and double backslash (\\1) which can
occur
+/// when SQL strings with escaped backslashes are passed through
fn regex_replace_posix_groups(replacement: &str) -> String {
static CAPTURE_GROUPS_RE_LOCK: LazyLock<Regex> =
- LazyLock::new(|| Regex::new(r"(\\)(\d*)").unwrap());
+ LazyLock::new(|| Regex::new(r"\\{1,2}(\d+)").unwrap());
Review Comment:
Thank you for the feedback Great catch on the \0 behavior.
I've added documentation and a test to explicitly document this behavior:
\0 is converted to ${0}, which in Rust's regex replacement syntax
substitutes the entire match
This is consistent with POSIX behavior where \0 (or &) refers to the entire
matched string
Added test cases:
// Test \0 behavior: \0 is converted to ${0}, which in Rust's regex
// replacement syntax substitutes the entire match. This is consistent
// with POSIX behavior where \0 (or &) refers to the entire matched string.
assert_eq!(regex_replace_posix_groups(r"\0"), "${0}");
assert_eq!(regex_replace_posix_groups(r"prefix\0suffix"),
"prefix${0}suffix");
So \0 is intentionally treated as a valid capture reference (for the entire
match), not literally. This aligns with standard regex behavior in most systems.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]