[
https://issues.apache.org/jira/browse/CALCITE-3415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16952631#comment-16952631
]
Pranay Parmar edited comment on CALCITE-3415 at 10/16/19 9:15 AM:
------------------------------------------------------------------
[~amaliujia]
*REGEXP_SUBSTR* function is present in Oracle, Teradata and a bunch of other
major dialects but not in BigQuery. As you mentioned the closest match in
BigQuery is *REGEXP_EXTRACT* and *REGEXP_EXTRACT_ALL*.
There are *4* variations of this function with 2, 3, 4 or 5 parameters :
*1. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>) [2 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('choco chico chipo', 'c+.{2}') FROM foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT('choco chico chipo', 'c+.{2}') FROM foodmart.product
{code}
*2. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>) [3 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('choco chico chipo', 'c+.{2}', 7) FROM foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT(SUBSTR('choco chico chipo', 7), 'c+.{2}') FROM
foodmart.product
{code}
*3. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>, <INT>) [4 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('chocolate chip cookies', 'c+.{2}', 4, 2) FROM
foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT_ALL(SUBSTR('chocolate chip cookies', 4), 'c+.{2}')
[OFFSET(4 - 1)] FROM foodmart.product
{code}
*4. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>, <INT>, <CHARACTER>) [5
params] :*
{code:sql}
SELECT REGEXP_SUBSTR('chocolate Chip cookies', 'c+.{2}', 4, 2, 'i') FROM
foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT_ALL(SUBSTR('chocolate Chip cookies', 4), '(?i)c+.{2}')
[OFFSET(4 - 1)] FROM foodmart.product
{code}
was (Author: pranay.parmar):
[~amaliujia]
*REGEXP_SUBSTR* function is present in Oracle, Teradata and a bunch of other
major dialects but not in BigQuery. As you mentioned the closest match in
BigQuery is *REGEXP_EXTRACT* and *REGEXP_EXTRACT_ALL*.
There are *4* variations of this function with 2, 3, 4 or 5 parameters :
*1. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>) [2 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('choco chico chipo', 'c+.\{2}') FROM foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT('choco chico chipo', 'c+.\{2}') FROM foodmart.product
{code}
*2. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>) [3 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('choco chico chipo', 'c+.\{2}', 7) FROM foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT(SUBSTR('choco chico chipo', 7), 'c+.\{2}') FROM
foodmart.product
{code}
*3. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>, <INT>) [4 params] :*
{code:sql}
SELECT REGEXP_SUBSTR('chocolate chip cookies', 'c+.\{2}', 4, 2) FROM
foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT_ALL(SUBSTR('chocolate chip cookies', 4), 'c+.{2}')
[OFFSET(4 - 1)] FROM foodmart.product
{code}
*4. REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>, <INT>, <INT>, <CHARACTER>) [5
params] :*
{code:sql}
SELECT REGEXP_SUBSTR('chocolate Chip cookies', 'c+.\{2}', 4, 2, 'i') FROM
foodmart.product
{code}
For BigQuery it will be unparsed into :
{code:sql}
SELECT REGEXP_EXTRACT_ALL(SUBSTR('chocolate Chip cookies', 4), '(?i)c+.{2}')
[OFFSET(4 - 1)] FROM foodmart.product
{code}
> Cannot parse REGEXP_SUBSTR in BigQuery
> --------------------------------------
>
> Key: CALCITE-3415
> URL: https://issues.apache.org/jira/browse/CALCITE-3415
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.21.0
> Reporter: Pranay Parmar
> Priority: Minor
>
> REGEXP_SUBSTR error :
> {code:java}
> No match found for function signature REGEXP_SUBSTR(<CHARACTER>, <CHARACTER>,
> [<INT>, <INT>, <CHARACTER>]){code}
>
> Example query:
> {code:sql}
> SELECT REGEXP_SUBSTR('chocolate Chip cookies', 'c+.{2}', 1, product_id, 'i')
> FROM public.account{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)