tanclary commented on code in PR #3460:
URL: https://github.com/apache/calcite/pull/3460#discussion_r1353072272
##########
babel/src/test/resources/sql/big-query.iq:
##########
@@ -707,7 +707,7 @@ SELECT SAFE_DIVIDE(CAST(1.7e308 as DOUBLE), CAST(1.7e-308
as DOUBLE)) as double_
!ok
-SELECT SAFE_DIVIDE(CAST(-3.5e75 AS DECIMAL(76, 0)), CAST(3.5e-75 AS
DECIMAL(76, 0)))
Review Comment:
Oops thanks for fixing
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -431,7 +433,7 @@ private static boolean checkPosOccurrenceParamValues(int
position,
throw
RESOURCE.invalidIntegerInputForRegexpFunctions(Integer.toString(occurrencePosition),
"occurrence_position", methodName).ex();
}
- if (position <= value.length()) {
+ if (position > value.length()) {
Review Comment:
Was this wrong before? Or why the change?
##########
core/src/main/java/org/apache/calcite/sql/validate/SqlConformanceEnum.java:
##########
@@ -240,6 +240,15 @@ public enum SqlConformanceEnum implements SqlConformance {
}
}
+ @Override public boolean isRegexReplaceCaptureGroupDollarIndexed() {
Review Comment:
I know I suggested this but reading it now I'm wondering whether this
belongs in
https://github.com/apache/calcite/blob/main/core/src/main/java/org/apache/calcite/sql2rel/StandardConvertletTable.java#L1288
especially since it is so function-specific. Could be a good question for
@julianhyde
##########
core/src/test/java/org/apache/calcite/test/SqlFunctionsTest.java:
##########
@@ -482,6 +482,35 @@ static <E> List<E> list() {
}
}
+ @Test void testRegexpReplaceNonDollarIndexed() {
+ final SqlFunctions.RegexFunction f = new SqlFunctions.RegexFunction();
+ assertThat(f.regexpReplaceNonDollarIndexed("abascusB", "b", "X"),
is("aXascusB"));
+ assertThat(f.regexpReplaceNonDollarIndexed("abc01def02ghi", "[a-z]+",
"X"), is("X01X02X"));
+ assertThat(f.regexpReplaceNonDollarIndexed("a0b1c2d3", "0|2", "X"),
is("aXb1cXd3"));
+
Review Comment:
remove blank line?
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -603,6 +605,38 @@ public String regexpReplace(String s, String regex, String
replacement,
return Unsafe.regexpReplace(s, pattern, replacement, pos, occurrence);
}
+ /** SQL {@code REGEXP_REPLACE} function with 3 arguments with
+ * {@code \\} based indexing for capturing groups.
+ */
+ public String regexpReplaceNonDollarIndexed(String s, String regex,
+ String replacement) {
+ // Preprocessing to convert double-backslash based indexing for capturing
+ // groups into $ based indices recognized by java regex.
Review Comment:
nit: "$-based" and "double-backslash-based"
##########
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java:
##########
@@ -2428,6 +2425,33 @@ private static class ContainsSubstrImplementor extends
AbstractRexCallImplemento
}
}
+ /** Implementor for the {@code REGEXP_REPLACE} function. */
+ private static class RegexpReplaceImplementor extends
AbstractRexCallImplementor {
+ RegexpReplaceImplementor() {
+ super("regexp_replace", NullPolicy.STRICT, false);
+ }
+
+ @Override Expression implementSafe(final RexToLixTranslator translator,
+ final RexCall call, final List<Expression> argValueList) {
+
+ final boolean dollarIndexed =
Review Comment:
maybe a small comment here? When I read `dollarIndexed` I'm not sure what
that means without outside context.
##########
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java:
##########
@@ -2428,6 +2425,33 @@ private static class ContainsSubstrImplementor extends
AbstractRexCallImplemento
}
}
+ /** Implementor for the {@code REGEXP_REPLACE} function. */
+ private static class RegexpReplaceImplementor extends
AbstractRexCallImplementor {
+ RegexpReplaceImplementor() {
+ super("regexp_replace", NullPolicy.STRICT, false);
+ }
+
+ @Override Expression implementSafe(final RexToLixTranslator translator,
+ final RexCall call, final List<Expression> argValueList) {
+
Review Comment:
can we remove this newline
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -603,6 +605,38 @@ public String regexpReplace(String s, String regex, String
replacement,
return Unsafe.regexpReplace(s, pattern, replacement, pos, occurrence);
}
+ /** SQL {@code REGEXP_REPLACE} function with 3 arguments with
+ * {@code \\} based indexing for capturing groups.
+ */
+ public String regexpReplaceNonDollarIndexed(String s, String regex,
+ String replacement) {
+ // Preprocessing to convert double-backslash based indexing for capturing
+ // groups into $ based indices recognized by java regex.
+
+ // Explicitly escaping any $ symbols coming from input
+ // to ignore them from being considered as capturing group index.
+ String indexedReplacement = replacement.replace("\\\\", "\\")
+ .replace("$", "\\$");
+
+ // Check each occurrence of escaped chars, convert \<n> integers into
+ // $<n> indices, keep \\ and \$, throw an error for any other invalid
escapes.
+ int lastOccIdx = 0;
+ while (lastOccIdx != -1) {
Review Comment:
This is a bit nested is there a way to flatten it some?
##########
core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java:
##########
@@ -515,7 +515,9 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding
operatorBinding,
OperandTypes.STRING_STRING_OPTIONAL_INTEGER_OPTIONAL_INTEGER_OPTIONAL_INTEGER,
SqlFunctionCategory.STRING);
- @LibraryOperator(libraries = {MYSQL, ORACLE})
+ /** The "REGEXP_REPLACE(value, regexp, rep [, pos [, occurrence [,
matchType]]])" function.
+ * Replaces all substrings of value that match regexp with rep and returns
modified value. */
+ @LibraryOperator(libraries = {MYSQL, ORACLE, BIG_QUERY})
Review Comment:
Make this alphabetical if you don't mind
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -546,7 +548,7 @@ public int regexpInstr(String value, String regex, int
position,
final String methodName = "REGEXP_INSTR";
final Pattern pattern = validateRegexPattern(regex, methodName);
- if (checkPosOccurrenceParamValues(position, occurrence,
occurrencePosition, value,
+ if (!validatePosOccurrenceParamValues(position, occurrence,
occurrencePosition, value,
methodName) || regex.isEmpty()) {
Review Comment:
Is it more optimal to check `regex.isEmpty()` first? Like if that is a
cheaper operation maybe we should do it first? What do you think?
##########
site/_docs/reference.md:
##########
@@ -2791,7 +2791,7 @@ BigQuery's type system uses confusingly different names
for types and functions:
| b | REGEXP_EXTRACT(string, regexp [, position [, occurrence]]) | Returns the
substring in *string* that matches the *regexp*, starting search at *position*
(default 1), and until locating the nth *occurrence* (default 1). Returns NULL
if there is no match
| b | REGEXP_EXTRACT_ALL(string, regexp) | Returns an array of all
substrings in *string* that matches the *regexp*. Returns an empty array if
there is no match
| b | REGEXP_INSTR(string, regexp [, position [, occurrence [,
occurrence_position]]]) | Returns the lowest 1-based position of the substring
in *string* that matches the *regexp*, starting search at *position* (default
1), and until locating the nth *occurrence* (default 1). Setting
occurrence_position (default 0) to 1 returns the end position of substring + 1.
Returns 0 if there is no match
-| m o | REGEXP_REPLACE(string, regexp, rep [, pos [, occurrence [,
matchType]]]) | Replaces all substrings of *string* that match *regexp* with
*rep* at the starting *pos* in expr (if omitted, the default is 1),
*occurrence* means which occurrence of a match to search for (if omitted, the
default is 1), *matchType* specifies how to perform matching
+| m o b | REGEXP_REPLACE(string, regexp, rep [, pos [, occurrence [,
matchType]]]) | Replaces all substrings of *string* that match *regexp* with
*rep* at the starting *pos* in expr (if omitted, the default is 1),
*occurrence* means which occurrence of a match to search for (if omitted, the
default is 1), *matchType* specifies how to perform matching
Review Comment:
nit: change "occurrence means which ..." to "occurrence specifies which ..."
to be consistent with the rest of the description
##########
site/_docs/reference.md:
##########
@@ -2791,7 +2791,7 @@ BigQuery's type system uses confusingly different names
for types and functions:
| b | REGEXP_EXTRACT(string, regexp [, position [, occurrence]]) | Returns the
substring in *string* that matches the *regexp*, starting search at *position*
(default 1), and until locating the nth *occurrence* (default 1). Returns NULL
if there is no match
| b | REGEXP_EXTRACT_ALL(string, regexp) | Returns an array of all
substrings in *string* that matches the *regexp*. Returns an empty array if
there is no match
| b | REGEXP_INSTR(string, regexp [, position [, occurrence [,
occurrence_position]]]) | Returns the lowest 1-based position of the substring
in *string* that matches the *regexp*, starting search at *position* (default
1), and until locating the nth *occurrence* (default 1). Setting
occurrence_position (default 0) to 1 returns the end position of substring + 1.
Returns 0 if there is no match
-| m o | REGEXP_REPLACE(string, regexp, rep [, pos [, occurrence [,
matchType]]]) | Replaces all substrings of *string* that match *regexp* with
*rep* at the starting *pos* in expr (if omitted, the default is 1),
*occurrence* means which occurrence of a match to search for (if omitted, the
default is 1), *matchType* specifies how to perform matching
+| m o b | REGEXP_REPLACE(string, regexp, rep [, pos [, occurrence [,
matchType]]]) | Replaces all substrings of *string* that match *regexp* with
*rep* at the starting *pos* in expr (if omitted, the default is 1),
*occurrence* means which occurrence of a match to search for (if omitted, the
default is 1), *matchType* specifies how to perform matching
Review Comment:
Make alphabetical as well
##########
testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java:
##########
@@ -4810,12 +4810,34 @@ private static void checkIf(SqlOperatorFixture f) {
"VARCHAR NOT NULL");
f.checkString("regexp_replace('abc\t\ndef\t\nghi', '\\w+', '+')",
"+\t\n+\t\n+",
"VARCHAR NOT NULL");
+
f.checkQuery("select regexp_replace('a b c', 'b', 'X')");
f.checkQuery("select regexp_replace('a b c', 'b', 'X', 1)");
f.checkQuery("select regexp_replace('a b c', 'b', 'X', 1, 3)");
f.checkQuery("select regexp_replace('a b c', 'b', 'X', 1, 3, 'i')");
};
- f0.forEachLibrary(list(SqlLibrary.MYSQL, SqlLibrary.ORACLE), consumer);
+ f0.forEachLibrary(list(SqlLibrary.MYSQL, SqlLibrary.ORACLE,
SqlLibrary.BIG_QUERY), consumer);
+
Review Comment:
Do any of these tests check what happens if you use the wrong syntax for
either dialect?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]