[ 
https://issues.apache.org/jira/browse/CALCITE-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17763212#comment-17763212
 ] 

Jerin John commented on CALCITE-5979:
-------------------------------------

Hi [~julianhyde] and everyone,

I wanted to get your opinion on this regex implementation for BQ, my attempt 
was to reuse the existing implementation of REGEXP_REPLACE from other 
libraries. But if you check this [BQ 
documentation|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_replace],
 there's a group replacement syntax supported by BQ involving {{"\"}} s which 
is not currently available for the other libraries, so it needs a precursor 
step to do a string replacement of this kind:

{{replacement = replacement.replaceAll("\\(\d)", "\$$1");}}

I tested this syntax with {{re2j}} library to check if its something supported 
by [RE2|https://github.com/google/re2] and not Java regex but thats not the 
case, seems like BQ might be custom processing it first. It's not evident why 
they went with this syntax since Java and regex standard annotation to index 
capturing groups from an expression is like {{$1, $2..}} and thats how we are 
replacing the {{"\"}} s in the above code as well.

I discussed with [~tanclary] and from what i understand, one possible approach 
is to modify the parsers to redirect to a {{SqlStdOperatorTable}} method to 
check the library conformance, and create a new BQ specific operator 
{{REGEXP_REPLACE_BIG_QUERY}} alongside the existing {{REGEXP_REPLACE}} 
operator. Op {{REGEXP_REPLACE_BIG_QUERY}} is mapped to its own {{SqlFunction}} 
for this custom preprocessing which should then call the original 
{{regexReplace}} method used by other dialects.

I get that this design abstracts the dialect specific logic out of rex layer, 
but also seems like a lot of steps to identify the conformance. It'd be great 
to get your thoughts on if there's a better approach or if this seems like the 
right way to go.

> Add REGEXP_REPLACE function (enabled in BigQuery library)
> ---------------------------------------------------------
>
>                 Key: CALCITE-5979
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5979
>             Project: Calcite
>          Issue Type: Task
>            Reporter: Jerin John
>            Assignee: Jerin John
>            Priority: Major
>              Labels: pull-request-available
>
> Add support for [REGEXP_REPLACE 
> |https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#regexp_replace]
>  function from BigQuery.
> *{{REGEXP_REPLACE(value, regexp, replacement)}}*
> Returns a STRING where all substrings of {{value}} that match regular 
> expression {{regexp}} are replaced with {{{}replacement{}}}.
> backslashed-escaped digits (\1 to \9) can be used within the {{replacement}} 
> argument to insert text matching the corresponding parenthesized group in the 
> {{regexp}} pattern.
> Example (added one space between \ \ to override md formatting):
> {{SELECT REGEXP_REPLACE("abc'", "b(.)", "X\ \1") as result;}}
> |result|
> |aXc|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to