kosiew commented on code in PR #19827:
URL: https://github.com/apache/datafusion/pull/19827#discussion_r2703524459
##########
datafusion/functions/src/regex/regexpreplace.rs:
##########
@@ -189,13 +189,15 @@ fn regexp_replace_func(args: &[ColumnarValue]) ->
Result<ArrayRef> {
}
}
-/// replace POSIX capture groups (like \1) with Rust Regex group (like ${1})
+/// replace POSIX capture groups (like \1 or \\1) with Rust Regex group (like
${1})
/// used by regexp_replace
+/// Handles both single backslash (\1) and double backslash (\\1) which can
occur
+/// when SQL strings with escaped backslashes are passed through
fn regex_replace_posix_groups(replacement: &str) -> String {
static CAPTURE_GROUPS_RE_LOCK: LazyLock<Regex> =
- LazyLock::new(|| Regex::new(r"(\\)(\d*)").unwrap());
+ LazyLock::new(|| Regex::new(r"\\{1,2}(\d+)").unwrap());
Review Comment:
This also matches \0, which will be rewritten as ${0}. In Rust’s regex
replacement syntax, ${0} substitutes the entire match, not a numbered capture.
Is \0 is meant to be a valid capture reference or treated literally?
Let's add a test to explicitly document that the behaviour of \0
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]