[
https://issues.apache.org/jira/browse/CALCITE-5873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17751206#comment-17751206
]
Jerin John commented on CALCITE-5873:
-------------------------------------
Highlighting an issue we encountered when testing this implementation:
The REGEXP_CONTAINS function in BQ is expected to return an error if the regexp
argument is invalid. To mimic this functionality we went with the
Pattern.compile() method from the native java.util.regex library, which parses
the expression into a regex object and throws a PatternSyntaxException for
invalid scenarios.
BigQuery/GoogleSQL uses the RE2 library to support regex evaluations (as
mentioned in [BQ
docs)|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_contains]
and is able to detect a few additional invalid cases that the java regex
library handles incorrectly.
Eg:
{{{}SELECT REGEXP_CONTAINS('abc def ghi', '{}}}{3}');
{{{}Cannot parse regular expression: no argument for repetition operator:
{3{}}}}
{{SELECT REGEXP_CONTAINS('abc def ghi', '\d');}}
{{Syntax error: Illegal escape sequence: \d at [1:40]}}
The above examples are accepted by the java regex library and returns an
incorrect boolean result instead of the expected errors from BQ, we need to
consider the need to handle these conditions explicitly or import the re2j
library for Java to do the parsing.
[~julianhyde] [~tanclary]
> Implement BigQuery functions REGEXP_CONTAINS
> --------------------------------------------
>
> Key: CALCITE-5873
> URL: https://issues.apache.org/jira/browse/CALCITE-5873
> Project: Calcite
> Issue Type: Task
> Reporter: Jerin John
> Assignee: Jerin John
> Priority: Major
> Labels: pull-request-available
>
> Add support for REGEXP_CONTAINS function from BigQuery.
> Function returns TRUE if input value is a partial match for the regular
> expression.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)