[jira] [Commented] (CALCITE-3789) Support validation of UNNEST multiple array columns like Presto

Will Yu (Jira) Thu, 02 Apr 2020 22:50:15 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-3789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074288#comment-17074288
 ]


Will Yu commented on CALCITE-3789:
----------------------------------

[~julianhyde] Make sense.

To convert SqlNode to RelNode, we need to 
* keep the STRUCT type instead of unboxing it in _Uncollect_
* add aliases for all to-be-unnested columns when building RelDataType

So I add a new field for all column alises in _Uncollect_, and use it to 
determine whether to unbox STRUCT type and if not, what aliases we should use.

My questions are:
* It seems that other than Collection type, MAP type could be unnested to two 
columns (key & value). How shall we handle MAP type in this case?
* In SqlToRelConverterTest, I turned off _decorrelate_ because decorrelated 
type is not the same as type after validation. After an initial investigation, 
it seems that the row type of Uncollect is not fully flattened, but the index 
is calculated on a fully flattened base 
(RelStructuredTypeFlattener.postFlattenSize). My question is that whether 
_decorrelate_ is a required step for this ticket?
* Generally, I am not sure whether we should put SqlToRelConverter changes and 
validation changes together, or better to put them into separate tickets & PRs.

Thanks!

> Support validation of UNNEST multiple array columns like Presto
> ---------------------------------------------------------------
>
>                 Key: CALCITE-3789
>                 URL: https://issues.apache.org/jira/browse/CALCITE-3789
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.21.0
>            Reporter: Will Yu
>            Assignee: Will Yu
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 9h
>  Remaining Estimate: 0h
>
> In Presto, users are able to UNNEST multiple array columns and CROSS JOIN 
> with the original table . As shown in the [Presto 
> doc|https://prestodb.io/docs/current/sql/select.html]:
> {code:sql}
> SELECT numbers, animals, n, a
> FROM (
>   VALUES
>     (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
>     (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])
> ) AS x (numbers, animals)
> CROSS JOIN UNNEST(numbers, animals) AS t (n, a)
> {code}
> yields:
>   numbers  |     animals      |  n   |  a
> -----------+------------------+------+------
>  [2, 5]    | [dog, cat, bird] |    2 | dog
>  [2, 5]    | [dog, cat, bird] |    5 | cat
>  [2, 5]    | [dog, cat, bird] | NULL | bird
>  [7, 8, 9] | [cow, pig]       |    7 | cow
>  [7, 8, 9] | [cow, pig]       |    8 | pig
>  [7, 8, 9] | [cow, pig]       |    9 | NULL
> It seems Calcite does not have such a feature to support this semantics. In 
> Calcite and for above SQL, _n_ and _a_ will be identified as alias of 
> subfields of numbers.
> The plan will be to introduce a new Presto conformance and enable validation 
> of such SQLs



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (CALCITE-3789) Support validation of UNNEST multiple array columns like Presto

Reply via email to