[ https://issues.apache.org/jira/browse/CALCITE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751206#comment-17751206 ]
Jerin John edited comment on CALCITE-5873 at 8/4/23 7:55 PM: ------------------------------------------------------------- Highlighting an issue we encountered when testing this implementation: The REGEXP_CONTAINS function in BQ is expected to return an error if the regexp argument is invalid. To mimic this functionality we went with the Pattern.compile() method from the native java.util.regex library, which parses the expression into a regex object and throws a PatternSyntaxException for invalid scenarios. BigQuery/GoogleSQL uses the RE2 library to support regex evaluations (as mentioned in [BQ docs)|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains] and is able to detect a few additional invalid cases that the java regex library handles incorrectly. Eg: {{SELECT REGEXP_CONTAINS('abc def ghi', '\{3}');}} {{Cannot parse regular expression: no argument for repetition operator: \{3}}} {{SELECT REGEXP_CONTAINS('abc def ghi', '\d');}} {{Syntax error: Illegal escape sequence: \d at [1:40]}} The above examples are accepted by the java regex library and returns an incorrect boolean result instead of the expected errors from BQ, we need to consider the need to handle these conditions explicitly or import the re2j library for Java to do the parsing. [~julianhyde] [~tanclary] was (Author: JIRAUSER301314): Highlighting an issue we encountered when testing this implementation: The REGEXP_CONTAINS function in BQ is expected to return an error if the regexp argument is invalid. To mimic this functionality we went with the Pattern.compile() method from the native java.util.regex library, which parses the expression into a regex object and throws a PatternSyntaxException for invalid scenarios. BigQuery/GoogleSQL uses the RE2 library to support regex evaluations (as mentioned in [BQ docs)|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains] and is able to detect a few additional invalid cases that the java regex library handles incorrectly. Eg: SELECT REGEXP_CONTAINS('abc def ghi', '\{3}'); Cannot parse regular expression: no argument for repetition operator: \{3} {{SELECT REGEXP_CONTAINS('abc def ghi', '\d');}} {{Syntax error: Illegal escape sequence: \d at [1:40]}} The above examples are accepted by the java regex library and returns an incorrect boolean result instead of the expected errors from BQ, we need to consider the need to handle these conditions explicitly or import the re2j library for Java to do the parsing. [~julianhyde] [~tanclary] > Implement BigQuery functions REGEXP_CONTAINS > -------------------------------------------- > > Key: CALCITE-5873 > URL: https://issues.apache.org/jira/browse/CALCITE-5873 > Project: Calcite > Issue Type: Task > Reporter: Jerin John > Assignee: Jerin John > Priority: Major > Labels: pull-request-available > > Add support for REGEXP_CONTAINS function from BigQuery. > Function returns TRUE if input value is a partial match for the regular > expression. -- This message was sent by Atlassian Jira (v8.20.10#820010)