tanclary commented on code in PR #3369:
URL: https://github.com/apache/calcite/pull/3369#discussion_r1297757955
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -355,11 +355,85 @@ public static boolean regexpContains(String value, String
regex) {
Pattern regexp = Pattern.compile(regex);
return regexp.matcher(value).find();
} catch (PatternSyntaxException ex) {
- throw
RESOURCE.invalidInputForRegexpContains(ex.getMessage().replace("\r\n", " ")
- .replace("\n", " ").replace("\r", " ")).ex();
+ throw
RESOURCE.invalidRegexInputForRegexpFunctions(ex.getMessage().replace("\r\n", "
")
+ .replace("\n", " ").replace("\r", " "),
Review Comment:
We should see if there's a way to avoid having to use all these `.replace()`
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -355,11 +355,85 @@ public static boolean regexpContains(String value, String
regex) {
Pattern regexp = Pattern.compile(regex);
return regexp.matcher(value).find();
} catch (PatternSyntaxException ex) {
- throw
RESOURCE.invalidInputForRegexpContains(ex.getMessage().replace("\r\n", " ")
- .replace("\n", " ").replace("\r", " ")).ex();
+ throw
RESOURCE.invalidRegexInputForRegexpFunctions(ex.getMessage().replace("\r\n", "
")
+ .replace("\n", " ").replace("\r", " "),
+ "REGEXP_CONTAINS").ex();
}
}
+ /** SQL {@code REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}
function.
+ * Returns NULL if there is no match, or if position or occurrence are
beyond range.
+ * Returns an exception if regex, position or occurrence are invalid.*/
+ public static @Nullable String regexpExtract(String value, String regex,
Integer... params) {
Review Comment:
is extract just a synonym? you can add it as an alias in
`StandardConvertletTable` and then you can refactor/remove a lot this code. You
shouldn't need a method for both functions here if they're the same. Let me
know if you need more info.
##########
site/_docs/reference.md:
##########
@@ -2779,7 +2779,9 @@ BigQuery's type system uses confusingly different names
for types and functions:
| h s | PARSE_URL(urlString, partToExtract [, keyToExtract] ) | Returns the
specified *partToExtract* from the *urlString*. Valid values for
*partToExtract* include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and
USERINFO. *keyToExtract* specifies which query to extract
| b | POW(numeric1, numeric2) | Returns *numeric1*
raised to the power *numeric2*
| b | REGEXP_CONTAINS(string, regexp) | Returns whether
*string* is a partial match for the *regexp*
+| b | REGEXP_EXTRACT(string, regexp[, position[, occurrence]]) | Returns the
substring in *string* that matches the regexp. Returns NULL if there is no
match. Use *position* for the start index of search range and *occurrence* for
the specific occurence of match in *string*
Review Comment:
surround with *
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -355,11 +355,85 @@ public static boolean regexpContains(String value, String
regex) {
Pattern regexp = Pattern.compile(regex);
return regexp.matcher(value).find();
} catch (PatternSyntaxException ex) {
- throw
RESOURCE.invalidInputForRegexpContains(ex.getMessage().replace("\r\n", " ")
- .replace("\n", " ").replace("\r", " ")).ex();
+ throw
RESOURCE.invalidRegexInputForRegexpFunctions(ex.getMessage().replace("\r\n", "
")
+ .replace("\n", " ").replace("\r", " "),
+ "REGEXP_CONTAINS").ex();
}
}
+ /** SQL {@code REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}
function.
+ * Returns NULL if there is no match, or if position or occurrence are
beyond range.
+ * Returns an exception if regex, position or occurrence are invalid.*/
+ public static @Nullable String regexpExtract(String value, String regex,
Integer... params) {
+ return processRegexpExtractOrSubstr("REGEXP_EXTRACT", value, regex,
params);
+ }
+
+ /** SQL {@code REGEXP_SUBSTR(value, regexp[, position[, occurrence]])}
function.
+ * Returns NULL if there is no match, or if position or occurrence are
beyond range.
+ * Returns an exception if regex, position or occurrence are invalid.*/
Review Comment:
Add a space here
##########
babel/src/test/resources/sql/big-query.iq:
##########
@@ -3657,4 +3915,5 @@ FROM items;
!ok
+
Review Comment:
Did you mean to add this blank line back?
##########
core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java:
##########
@@ -486,6 +483,25 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding
operatorBinding,
OperandTypes.STRING_STRING,
SqlFunctionCategory.STRING);
+ /** The "REGEXP_EXTRACT(value, regexp[, position[, occurrence]])" function.
+ * Returns the substring in value that matches the regexp. Returns NULL if
there is no match. */
+ @LibraryOperator(libraries = {BIG_QUERY})
+ public static final SqlFunction REGEXP_EXTRACT =
+ SqlBasicFunction.create("REGEXP_EXTRACT", ReturnTypes.VARCHAR_NULLABLE,
+ OperandTypes.STRING_STRING_OPTIONAL_INTEGER_OPTIONAL_INTEGER,
+ SqlFunctionCategory.STRING);
+
+ @LibraryOperator(libraries = {MYSQL, ORACLE})
+ public static final SqlFunction REGEXP_REPLACE = new
SqlRegexpReplaceFunction();
+
+ /** The "REGEXP_SUBSTR(value, regexp[, position[, occurrence]])" function.
+ * Returns the substring in value that matches the regexp. Returns NULL if
there is no match. */
+ @LibraryOperator(libraries = {BIG_QUERY})
Review Comment:
You can just do ` = REGEXP_EXTRACT.withName('REGEXP_SUBSTR')`
##########
site/_docs/reference.md:
##########
@@ -2779,7 +2779,9 @@ BigQuery's type system uses confusingly different names
for types and functions:
| h s | PARSE_URL(urlString, partToExtract [, keyToExtract] ) | Returns the
specified *partToExtract* from the *urlString*. Valid values for
*partToExtract* include HOST, PATH, QUERY, REF, PROTOCOL, AUTHORITY, FILE, and
USERINFO. *keyToExtract* specifies which query to extract
| b | POW(numeric1, numeric2) | Returns *numeric1*
raised to the power *numeric2*
| b | REGEXP_CONTAINS(string, regexp) | Returns whether
*string* is a partial match for the *regexp*
+| b | REGEXP_EXTRACT(string, regexp[, position[, occurrence]]) | Returns the
substring in *string* that matches the regexp. Returns NULL if there is no
match. Use *position* for the start index of search range and *occurrence* for
the specific occurence of match in *string*
| m o | REGEXP_REPLACE(string, regexp, rep [, pos [, occurrence [,
matchType]]]) | Replaces all substrings of *string* that match *regexp* with
*rep* at the starting *pos* in expr (if omitted, the default is 1),
*occurrence* means which occurrence of a match to search for (if omitted, the
default is 1), *matchType* specifies how to perform matching
+| b | REGEXP_SUBSTR(string, regexp[, position[, occurrence]]) | Synonym for
REGEXP_EXTRACT. Returns the substring in *string* that matches the regexp
Review Comment:
You can drop this part
##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -355,11 +355,85 @@ public static boolean regexpContains(String value, String
regex) {
Pattern regexp = Pattern.compile(regex);
return regexp.matcher(value).find();
} catch (PatternSyntaxException ex) {
- throw
RESOURCE.invalidInputForRegexpContains(ex.getMessage().replace("\r\n", " ")
- .replace("\n", " ").replace("\r", " ")).ex();
+ throw
RESOURCE.invalidRegexInputForRegexpFunctions(ex.getMessage().replace("\r\n", "
")
+ .replace("\n", " ").replace("\r", " "),
+ "REGEXP_CONTAINS").ex();
}
}
+ /** SQL {@code REGEXP_EXTRACT(value, regexp[, position[, occurrence]])}
function.
+ * Returns NULL if there is no match, or if position or occurrence are
beyond range.
+ * Returns an exception if regex, position or occurrence are invalid.*/
+ public static @Nullable String regexpExtract(String value, String regex,
Integer... params) {
+ return processRegexpExtractOrSubstr("REGEXP_EXTRACT", value, regex,
params);
+ }
+
+ /** SQL {@code REGEXP_SUBSTR(value, regexp[, position[, occurrence]])}
function.
+ * Returns NULL if there is no match, or if position or occurrence are
beyond range.
+ * Returns an exception if regex, position or occurrence are invalid.*/
+ public static @Nullable String regexpSubstr(String value, String regex,
Integer... params) {
+ return processRegexpExtractOrSubstr("REGEXP_SUBSTR", value, regex, params);
+ }
+
+ private static @Nullable String processRegexpExtractOrSubstr(String
methodName, String value,
Review Comment:
once you add one as the alias of the other you can just change this to
`regexpExtract` to match other functions
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]