[
https://issues.apache.org/jira/browse/CALCITE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763212#comment-17763212
]
Jerin John commented on CALCITE-5979:
-------------------------------------
Hi [~julianhyde] and everyone,
I wanted to get your opinion on this regex implementation for BQ, my attempt
was to reuse the existing implementation of REGEXP_REPLACE from other
libraries. But if you check this [BQ
documentation|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_replace],
there's a group replacement syntax supported by BQ involving {{"\"}} s which
is not currently available for the other libraries, so it needs a precursor
step to do a string replacement of this kind:
{{replacement = replacement.replaceAll("\\(\d)", "\$$1");}}
I tested this syntax with {{re2j}} library to check if its something supported
by [RE2|https://github.com/google/re2] and not Java regex but thats not the
case, seems like BQ might be custom processing it first. It's not evident why
they went with this syntax since Java and regex standard annotation to index
capturing groups from an expression is like {{$1, $2..}} and thats how we are
replacing the {{"\"}} s in the above code as well.
I discussed with [~tanclary] and from what i understand, one possible approach
is to modify the parsers to redirect to a {{SqlStdOperatorTable}} method to
check the library conformance, and create a new BQ specific operator
{{REGEXP_REPLACE_BIG_QUERY}} alongside the existing {{REGEXP_REPLACE}}
operator. Op {{REGEXP_REPLACE_BIG_QUERY}} is mapped to its own {{SqlFunction}}
for this custom preprocessing which should then call the original
{{regexReplace}} method used by other dialects.
I get that this design abstracts the dialect specific logic out of rex layer,
but also seems like a lot of steps to identify the conformance. It'd be great
to get your thoughts on if there's a better approach or if this seems like the
right way to go.
> Add REGEXP_REPLACE function (enabled in BigQuery library)
> ---------------------------------------------------------
>
> Key: CALCITE-5979
> URL: https://issues.apache.org/jira/browse/CALCITE-5979
> Project: Calcite
> Issue Type: Task
> Reporter: Jerin John
> Assignee: Jerin John
> Priority: Major
> Labels: pull-request-available
>
> Add support for [REGEXP_REPLACE
> |https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_replace]
> function from BigQuery.
> *{{REGEXP_REPLACE(value, regexp, replacement)}}*
> Returns a STRING where all substrings of {{value}} that match regular
> expression {{regexp}} are replaced with {{{}replacement{}}}.
> backslashed-escaped digits (\1 to \9) can be used within the {{replacement}}
> argument to insert text matching the corresponding parenthesized group in the
> {{regexp}} pattern.
> Example (added one space between \ \ to override md formatting):
> {{SELECT REGEXP_REPLACE("abc'", "b(.)", "X\ \1") as result;}}
> |result|
> |aXc|
--
This message was sent by Atlassian Jira
(v8.20.10#820010)