tanclary commented on code in PR #3338: URL: https://github.com/apache/calcite/pull/3338#discussion_r1274192744
########## babel/src/test/resources/sql/big-query.iq: ########## @@ -3451,5 +3451,60 @@ FROM items; !ok +##################################################################### +# REGEXP_CONTAINS(value, regexp) +# +# Takes two STRING values. Returns TRUE if value is a partial match +# for the regular expression, regexp. +# If the regexp argument is invalid, the function returns an error. + +SELECT + email, + REGEXP_CONTAINS(email, r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid +FROM + (SELECT + ['[email protected]', '[email protected]', 'www.example.net'] + AS addresses), + UNNEST(addresses) AS email; + Review Comment: Does it pass with this blank line between the query and the results? We may want to remove it to be consistent with other tests. ########## core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java: ########## @@ -684,6 +685,8 @@ Builder populate2() { defineMethod(PARSE_TIME, "parseTime", NullPolicy.STRICT); defineMethod(PARSE_TIMESTAMP, "parseTimestamp", NullPolicy.STRICT); + defineMethod(REGEXP_CONTAINS, "regexpContains", NullPolicy.STRICT); Review Comment: I would move to this around line 545 so it is with the other BigQuery string functions like SPLIT, etc ########## core/src/main/java/org/apache/calcite/runtime/CalciteResource.java: ########## @@ -1008,6 +1008,9 @@ ExInstWithCause<CalciteException> failedToAccessField( @BaseMessage("Invalid input for REGEXP_REPLACE: ''{0}''") ExInst<CalciteException> invalidInputForRegexpReplace(String value); + @BaseMessage("Invalid regex input for REGEXP_CONTAINS: ''{0}''") + ExInst<CalciteException> invalidInputForRegexpContains(String value); Review Comment: nit (up to you): move this above RegexpReplace (especially if you anticipate adding other Regexp Errors to this, might make sense to alphabetize them) ########## testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java: ########## @@ -4438,6 +4438,31 @@ private static void checkIf(SqlOperatorFixture f) { f0.forEachLibrary(list(SqlLibrary.MYSQL, SqlLibrary.ORACLE), consumer); } + @Test + void testRegexpContainsFunc() { + final SqlOperatorFixture f = fixture().setFor(SqlLibraryOperators.REGEXP_CONTAINS) + .withLibrary(SqlLibrary.BIG_QUERY); + f.checkBoolean("regexp_contains('abc def ghi', 'abc')", true); + f.checkBoolean("regexp_contains('abc def ghi', '[a-z]+')", true); + f.checkBoolean("regexp_contains('[email protected]', '@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+')", true); + f.checkBoolean("regexp_contains('[email protected]', '@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+')", false); + f.checkBoolean("regexp_contains('5556664422', '^\\d{10}$')", true); + f.checkBoolean("regexp_contains('11555666442233', '^\\d{10}$')", false); + f.checkBoolean("regexp_contains('55566644221133', '\\d{10}')", true); + f.checkBoolean("regexp_contains('55as56664as422', '\\d{10}')", false); + + f.checkQuery("select regexp_contains('abc def ghi', 'abc')"); + f.checkQuery("select regexp_contains('[email protected]', '@[a-zA-Z0-9-]+\\\\.[a-zA-Z0-9-.]+')"); + f.checkQuery("select regexp_contains('55as56664as422', '\\d{10}')"); + + f.checkFails("regexp_contains('abc def ghi', '(abc')", + "Invalid regex input for REGEXP_CONTAINS: '(abc'", true); + f.checkFails("regexp_contains('abc def ghi', '{3}')", + "Invalid regex input for REGEXP_CONTAINS: '{3}'", true); + f.checkFails("regexp_contains('abc def ghi', '[z-a]')", Review Comment: add a couple more tests for null values "f.checkNull" might be useful ########## core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java: ########## @@ -394,6 +394,15 @@ private static int makeRegexpFlags(@Nullable String stringFlags) { return flags; } + /** SQL {@code REGEXP_CONTAINS(value, regexp)} function. */ + public static boolean regexpContains(String value, String regex) { + try { + return Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(value).matches(); + } catch (Exception exception) { Review Comment: nit: change this to 'e' to match other caught exception names ########## core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java: ########## @@ -467,6 +467,16 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding operatorBinding, @LibraryOperator(libraries = {MYSQL, ORACLE}) public static final SqlFunction REGEXP_REPLACE = new SqlRegexpReplaceFunction(); + /** + * The "REGEXP_CONTAINS(value, regexp)" function. + * Returns TRUE if value is a partial match for the regular expression, regexp. + */ + @LibraryOperator(libraries = { BIG_QUERY }) Review Comment: Also can you fix the javadoc so that it matches the format of the other ones? There are some good examples starting around line 511 ########## core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java: ########## @@ -467,6 +467,16 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding operatorBinding, @LibraryOperator(libraries = {MYSQL, ORACLE}) public static final SqlFunction REGEXP_REPLACE = new SqlRegexpReplaceFunction(); + /** + * The "REGEXP_CONTAINS(value, regexp)" function. + * Returns TRUE if value is a partial match for the regular expression, regexp. + */ + @LibraryOperator(libraries = { BIG_QUERY }) Review Comment: Can remove the spaces around 'BIG_QUERY' ########## site/_docs/reference.md: ########## @@ -2774,6 +2774,7 @@ BigQuery's type system uses confusingly different names for types and functions: | b | PARSE_TIME(format, string) | Uses format specified by *format* to convert *string* representation of time to a TIME value | b | PARSE_TIMESTAMP(format, string[, timeZone]) | Uses format specified by *format* to convert *string* representation of timestamp to a TIMESTAMP WITH LOCAL TIME ZONE value in *timeZone* | b | POW(numeric1, numeric2) | Returns *numeric1* raised to the power *numeric2* +| b | REGEXP_CONTAINS(value, regexp) | Returns TRUE if *value* is a partial match for the regular expression, *regexp*. Review Comment: Don't need period here. nit: Would it be easier to say something like "Returns whether \*value\* is a partial match for the regular expression, \*regexp\*? This is up to you ########## core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java: ########## @@ -394,6 +394,15 @@ private static int makeRegexpFlags(@Nullable String stringFlags) { return flags; } + /** SQL {@code REGEXP_CONTAINS(value, regexp)} function. */ + public static boolean regexpContains(String value, String regex) { + try { + return Pattern.compile(regex, Pattern.CASE_INSENSITIVE).matcher(value).matches(); + } catch (Exception exception) { Review Comment: This is inconsistent though because there is 'e', 'ex', etc -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
