GaneshPatil7517 commented on code in PR #19827:
URL: https://github.com/apache/datafusion/pull/19827#discussion_r2703542503


##########
datafusion/functions/src/regex/regexpreplace.rs:
##########
@@ -189,13 +189,15 @@ fn regexp_replace_func(args: &[ColumnarValue]) -> 
Result<ArrayRef> {
     }
 }
 
-/// replace POSIX capture groups (like \1) with Rust Regex group (like ${1})
+/// replace POSIX capture groups (like \1 or \\1) with Rust Regex group (like 
${1})
 /// used by regexp_replace
+/// Handles both single backslash (\1) and double backslash (\\1) which can 
occur
+/// when SQL strings with escaped backslashes are passed through
 fn regex_replace_posix_groups(replacement: &str) -> String {
     static CAPTURE_GROUPS_RE_LOCK: LazyLock<Regex> =
-        LazyLock::new(|| Regex::new(r"(\\)(\d*)").unwrap());
+        LazyLock::new(|| Regex::new(r"\\{1,2}(\d+)").unwrap());

Review Comment:
   Thank you for the feedback Great catch on the \0 behavior.
   
   I've added documentation and a test to explicitly document this behavior:
   
   \0 is converted to ${0}, which in Rust's regex replacement syntax 
substitutes the entire match
   This is consistent with POSIX behavior where \0 (or &) refers to the entire 
matched string
   Added test cases:
   
   // Test \0 behavior: \0 is converted to ${0}, which in Rust's regex
   // replacement syntax substitutes the entire match. This is consistent
   // with POSIX behavior where \0 (or &) refers to the entire matched string.
   assert_eq!(regex_replace_posix_groups(r"\0"), "${0}");
   assert_eq!(regex_replace_posix_groups(r"prefix\0suffix"), 
"prefix${0}suffix");
   
   So \0 is intentionally treated as a valid capture reference (for the entire 
match), not literally. This aligns with standard regex behavior in most systems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to