tanclary commented on code in PR #3338:
URL: https://github.com/apache/calcite/pull/3338#discussion_r1274192744


##########
babel/src/test/resources/sql/big-query.iq:
##########
@@ -3451,5 +3451,60 @@ FROM items;
 
 !ok
 
+#####################################################################
+# REGEXP_CONTAINS(value, regexp)
+#
+# Takes two STRING values. Returns TRUE if value is a partial match
+# for the regular expression, regexp.
+# If the regexp argument is invalid, the function returns an error.
+
+SELECT
+  email,
+  REGEXP_CONTAINS(email, r'@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+') AS is_valid
+FROM
+  (SELECT
+    ['[email protected]', '[email protected]', 'www.example.net']
+    AS addresses),
+  UNNEST(addresses) AS email;
+

Review Comment:
   Does it pass with this blank line between the query and the results? We may 
want to remove it to be consistent with other tests.



##########
core/src/main/java/org/apache/calcite/adapter/enumerable/RexImpTable.java:
##########
@@ -684,6 +685,8 @@ Builder populate2() {
       defineMethod(PARSE_TIME, "parseTime", NullPolicy.STRICT);
       defineMethod(PARSE_TIMESTAMP, "parseTimestamp", NullPolicy.STRICT);
 
+      defineMethod(REGEXP_CONTAINS, "regexpContains", NullPolicy.STRICT);

Review Comment:
   I would move to this around line 545 so it is with the other BigQuery string 
functions like SPLIT, etc



##########
core/src/main/java/org/apache/calcite/runtime/CalciteResource.java:
##########
@@ -1008,6 +1008,9 @@ ExInstWithCause<CalciteException> failedToAccessField(
   @BaseMessage("Invalid input for REGEXP_REPLACE: ''{0}''")
   ExInst<CalciteException> invalidInputForRegexpReplace(String value);
 
+  @BaseMessage("Invalid regex input for REGEXP_CONTAINS: ''{0}''")
+  ExInst<CalciteException> invalidInputForRegexpContains(String value);

Review Comment:
   nit (up to you): move this above RegexpReplace (especially if you anticipate 
adding other Regexp Errors to this, might make sense to alphabetize them)



##########
testkit/src/main/java/org/apache/calcite/test/SqlOperatorTest.java:
##########
@@ -4438,6 +4438,31 @@ private static void checkIf(SqlOperatorFixture f) {
     f0.forEachLibrary(list(SqlLibrary.MYSQL, SqlLibrary.ORACLE), consumer);
   }
 
+  @Test
+  void testRegexpContainsFunc() {
+    final SqlOperatorFixture f = 
fixture().setFor(SqlLibraryOperators.REGEXP_CONTAINS)
+        .withLibrary(SqlLibrary.BIG_QUERY);
+    f.checkBoolean("regexp_contains('abc def ghi', 'abc')", true);
+    f.checkBoolean("regexp_contains('abc def ghi', '[a-z]+')", true);
+    f.checkBoolean("regexp_contains('[email protected]', 
'@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+')", true);
+    f.checkBoolean("regexp_contains('[email protected]', 
'@[a-zA-Z0-9-]+\\.[a-zA-Z0-9-.]+')", false);
+    f.checkBoolean("regexp_contains('5556664422', '^\\d{10}$')", true);
+    f.checkBoolean("regexp_contains('11555666442233', '^\\d{10}$')", false);
+    f.checkBoolean("regexp_contains('55566644221133', '\\d{10}')", true);
+    f.checkBoolean("regexp_contains('55as56664as422', '\\d{10}')", false);
+
+    f.checkQuery("select regexp_contains('abc def ghi', 'abc')");
+    f.checkQuery("select regexp_contains('[email protected]', 
'@[a-zA-Z0-9-]+\\\\.[a-zA-Z0-9-.]+')");
+    f.checkQuery("select regexp_contains('55as56664as422', '\\d{10}')");
+
+    f.checkFails("regexp_contains('abc def ghi', '(abc')",
+        "Invalid regex input for REGEXP_CONTAINS: '(abc'", true);
+    f.checkFails("regexp_contains('abc def ghi', '{3}')",
+        "Invalid regex input for REGEXP_CONTAINS: '{3}'", true);
+    f.checkFails("regexp_contains('abc def ghi', '[z-a]')",

Review Comment:
   add a couple more tests for null values "f.checkNull" might be useful



##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -394,6 +394,15 @@ private static int makeRegexpFlags(@Nullable String 
stringFlags) {
     return flags;
   }
 
+  /** SQL {@code REGEXP_CONTAINS(value, regexp)} function. */
+  public static boolean regexpContains(String value, String regex) {
+    try {
+      return Pattern.compile(regex, 
Pattern.CASE_INSENSITIVE).matcher(value).matches();
+    } catch (Exception exception) {

Review Comment:
   nit: change this to 'e' to match other caught exception names



##########
core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java:
##########
@@ -467,6 +467,16 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding 
operatorBinding,
   @LibraryOperator(libraries = {MYSQL, ORACLE})
   public static final SqlFunction REGEXP_REPLACE = new 
SqlRegexpReplaceFunction();
 
+  /**
+   * The "REGEXP_CONTAINS(value, regexp)" function.
+   * Returns TRUE if value is a partial match for the regular expression, 
regexp.
+   */
+  @LibraryOperator(libraries = { BIG_QUERY })

Review Comment:
   Also can you fix the javadoc so that it matches the format of the other 
ones? There are some good examples starting around line 511



##########
core/src/main/java/org/apache/calcite/sql/fun/SqlLibraryOperators.java:
##########
@@ -467,6 +467,16 @@ static RelDataType deriveTypeSplit(SqlOperatorBinding 
operatorBinding,
   @LibraryOperator(libraries = {MYSQL, ORACLE})
   public static final SqlFunction REGEXP_REPLACE = new 
SqlRegexpReplaceFunction();
 
+  /**
+   * The "REGEXP_CONTAINS(value, regexp)" function.
+   * Returns TRUE if value is a partial match for the regular expression, 
regexp.
+   */
+  @LibraryOperator(libraries = { BIG_QUERY })

Review Comment:
   Can remove the spaces around 'BIG_QUERY' 



##########
site/_docs/reference.md:
##########
@@ -2774,6 +2774,7 @@ BigQuery's type system uses confusingly different names 
for types and functions:
 | b | PARSE_TIME(format, string)                     | Uses format specified 
by *format* to convert *string* representation of time to a TIME value
 | b | PARSE_TIMESTAMP(format, string[, timeZone])    | Uses format specified 
by *format* to convert *string* representation of timestamp to a TIMESTAMP WITH 
LOCAL TIME ZONE value in *timeZone*
 | b | POW(numeric1, numeric2)                        | Returns *numeric1* 
raised to the power *numeric2*
+| b | REGEXP_CONTAINS(value, regexp)                 | Returns TRUE if *value* 
is a partial match for the regular expression, *regexp*.

Review Comment:
   Don't need period here. 
   
   nit: Would it be easier to say something like "Returns whether \*value\* is 
a partial match for the regular expression, \*regexp\*? This is up to you



##########
core/src/main/java/org/apache/calcite/runtime/SqlFunctions.java:
##########
@@ -394,6 +394,15 @@ private static int makeRegexpFlags(@Nullable String 
stringFlags) {
     return flags;
   }
 
+  /** SQL {@code REGEXP_CONTAINS(value, regexp)} function. */
+  public static boolean regexpContains(String value, String regex) {
+    try {
+      return Pattern.compile(regex, 
Pattern.CASE_INSENSITIVE).matcher(value).matches();
+    } catch (Exception exception) {

Review Comment:
   This is inconsistent though because there is 'e', 'ex', etc



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to