Tanner Clary created CALCITE-5644:
-------------------------------------
Summary: Implement BigQuery CONTAINS_SUBSTR
Key: CALCITE-5644
URL: https://issues.apache.org/jira/browse/CALCITE-5644
Project: Calcite
Issue Type: Task
Reporter: Tanner Clary
BigQuery offers the
[CONTAINS_SUBSTR|https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions#contains_substr]
function that returns {{TRUE}} if a substring is present in an expression and
{{FALSE}} if it is not. A basic example of this may be seen in [1].
The expression can take many forms (more info in the linked doc) which makes
its implementation more complex than other string functions that only accept
arguments of type {{STRING}}. For instance, the expression to be searched can
be a column or table reference.
The function also has an optional third argument called {{json_scope}} where
the user can indicate the scope of JSON data (keys, values, or both) to be
searched for the substring.
I am curious if anyone has thoughts on how the search of rows or tables could
be implemented. I have a basic implementation (that supports expressions of
type {{STRING}} and nothing else) that I will open a draft PR for as a starting
point. To me, the challenge is implementing the additional features like the
{{JSON_SCOPE}} argument (seen in [2]) and performing a cross field search as
seen in [3].
[1] {{SELECT CONTAINS_SUBSTR("hello", "he");}} would return {{TRUE}}.
[2]
{{SELECT *
FROM Recipes
WHERE CONTAINS_SUBSTR(
(SELECT AS STRUCT Recipes.* EXCEPT (Lunch, Dinner)),
'potato'
);}} would return:
/*-------------------+-------------------------+------------------*
| Breakfast | Lunch | Dinner
|
+-------------------+-------------------------+------------------+
| Potato pancakes | Toasted cheese sandwich | Beef stroganoff |
*-------------------+-------------------------+------------------*/
--
This message was sent by Atlassian Jira
(v8.20.10#820010)