Anthrino commented on code in PR #3460:
URL: https://github.com/apache/calcite/pull/3460#discussion_r1353373965
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -603,6 +605,38 @@ public String regexpReplace(String s, String regex, String
replacement,
return Unsafe.regexpReplace(s, pattern, replacement, pos, occurrence);
}
+ /** SQL {@code REGEXP_REPLACE} function with 3 arguments with
+ * {@code \\} based indexing for capturing groups.
+ */
+ public String regexpReplaceNonDollarIndexed(String s, String regex,
+ String replacement) {
+ // Preprocessing to convert double-backslash based indexing for capturing
+ // groups into $ based indices recognized by java regex.
+
+ // Explicitly escaping any $ symbols coming from input
+ // to ignore them from being considered as capturing group index.
+ String indexedReplacement = replacement.replace("\\\\", "\\")
+ .replace("$", "\\$");
+
+ // Check each occurrence of escaped chars, convert \<n> integers into
+ // $<n> indices, keep \\ and \$, throw an error for any other invalid
escapes.
+ int lastOccIdx = 0;
+ while (lastOccIdx != -1) {
Review Comment:
I agree and would love to reduce the complexity here, but we need to check
every occurrence of double-backslashes left to right to identify if they are
used for indexing or escaping other chars ahead. The replacement to $ can only
be made if subsequent escaped char is an integer, instead of blindly replacing
all double-backslashes in one go.
We can go over it and check if there is an easier path but this is the logic
that aligns with how BQ evaluates input expressions, especially in cases where
we have multiple backslashes together (example [test
case](https://github.com/apache/calcite/blob/166efa4152cd448b9618aba39944215955f2d10f/testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java#L4828))
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]