[ 
https://issues.apache.org/jira/browse/CALCITE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751206#comment-17751206
 ] 

Jerin John edited comment on CALCITE-5873 at 8/4/23 7:55 PM:
-------------------------------------------------------------

Highlighting an issue we encountered when testing this implementation:
The REGEXP_CONTAINS function in BQ is expected to return an error if the regexp 
argument is invalid. To mimic this functionality we went with the 
Pattern.compile() method from the native java.util.regex library, which parses 
the expression into a regex object and throws a PatternSyntaxException for 
invalid scenarios.

BigQuery/GoogleSQL uses the RE2 library to support regex evaluations (as 
mentioned in [BQ 
docs)|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains]
 and is able to detect a few additional invalid cases that the java regex 
library handles incorrectly.
Eg:

{{SELECT REGEXP_CONTAINS('abc def ghi', '\{3}');}}

{{Cannot parse regular expression: no argument for repetition operator: \{3}}}

 

{{SELECT REGEXP_CONTAINS('abc def ghi', '\d');}}

{{Syntax error: Illegal escape sequence: \d at [1:40]}}

 

The above examples are accepted by the java regex library and returns an 
incorrect boolean result instead of the expected errors from BQ, we need to 
consider the need to handle these conditions explicitly or import the re2j 
library for Java to do the parsing.
 
[~julianhyde] [~tanclary]

 


was (Author: JIRAUSER301314):
Highlighting an issue we encountered when testing this implementation:
The REGEXP_CONTAINS function in BQ is expected to return an error if the regexp 
argument is invalid. To mimic this functionality we went with the 
Pattern.compile() method from the native java.util.regex library, which parses 
the expression into a regex object and throws a PatternSyntaxException for 
invalid scenarios.

BigQuery/GoogleSQL uses the RE2 library to support regex evaluations (as 
mentioned in [BQ 
docs)|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains]
 and is able to detect a few additional invalid cases that the java regex 
library handles incorrectly.
Eg:

SELECT REGEXP_CONTAINS('abc def ghi', '\{3}');

Cannot parse regular expression: no argument for repetition operator: \{3}

 

{{SELECT REGEXP_CONTAINS('abc def ghi', '\d');}}

{{Syntax error: Illegal escape sequence: \d at [1:40]}}

 

The above examples are accepted by the java regex library and returns an 
incorrect boolean result instead of the expected errors from BQ, we need to 
consider the need to handle these conditions explicitly or import the re2j 
library for Java to do the parsing.
 
[~julianhyde] [~tanclary]

 

> Implement BigQuery functions REGEXP_CONTAINS
> --------------------------------------------
>
>                 Key: CALCITE-5873
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5873
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Jerin John
>            Assignee: Jerin John
>            Priority: Major
>              Labels: pull-request-available
>
> Add support for REGEXP_CONTAINS function from BigQuery.
> Function returns TRUE if input value is a partial match for the regular 
> expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to