[ 
https://issues.apache.org/jira/browse/CALCITE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihai Budiu updated CALCITE-7608:
---------------------------------
    Description: 
Today SQL UNNEST is implemented using the Uncollect operator. We propose adding 
two additional capabilities to Uncollect, which generalize its current 
behavior: 


 * support for LEFT JOIN UNNEST
 * support for carrying over non-collection input fields to the output unchanged

The first capability is needed because the semantics of UNCOLLECT on the 
right-hand-side of a LEFT JOIN cannot be implemented using the existing 
semantics: UNCOLLECT will need to produce a NULL for empty or NULL collections 
in such cases.


The second capability is independent on the first one: it would strictly 
increase the expressive power of Uncollect, allowing it to not only Unnest 
collections, but also copy some input fields unchanged for every element of an 
unnested collection.

The main reason for the second capability is that it would enable us to 
represent plans using SQL UNNEST without using Correlate nodes. A sub-plan 
containing Project + Correlate + Uncollect + Values would become a single 
Uncollect node. Neither the old nor the new decorrelator can actually eliminate 
Correlate + Uncollect (there is no other representation to express the same 
behavior). Using the new representation we can decorrelate many more plans 
(perhaps all plans).

Although these two changes are logically independent, I propose to make them in 
a single combined PR, because they would both change the type checking and 
runtime implementation of the operator, and it is desirable to minimize the 
churn on consumers of the new operator.

A temporary configuration flag would be introduced to control whether 
SqlToRelConverter uses the new form of the operator, or just the old form. 
Hopefully programs that only use the old form would remain unchanged. This will 
give Calcite users time to upgrade to the new representation at their own pace.

  was:
Today UNNEST is implemented using the Uncollect operator. We propose adding an 
alternative LogicalSelectMany operator, which generalizes Uncollect. (Notice 
that Enumerable API already has a SelectMany.) The main difference between 
Uncollect and SelectMany is that Uncollect unnests all the fields of its input 
relation, whereas LogicalSelectMany would only unnest SOME of the fields of the 
input collection, preserving the other ones in each output row.

This distinction is very important, because:
 * LogicalSelectMany can be directly and efficiently implemented using the 
Enumerable SelectMany
 * UNNEST used in a cross-join is implemented using an Uncollect and a 
LogicalCorrelate. However, the same UNNEST can be represented using just one 
LogicalSelectMany node
 * Neither the old nor the new decorrelator can actually eliminate 
LogicalCorrelate nodes that are paired with Uncollect. Using LogicalSelectMany 
we can decorrelate many more plans.


> Enhance Uncollect 
> ------------------
>
>                 Key: CALCITE-7608
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7608
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Mihai Budiu
>            Assignee: Mihai Budiu
>            Priority: Minor
>              Labels: pull-request-available
>
> Today SQL UNNEST is implemented using the Uncollect operator. We propose 
> adding two additional capabilities to Uncollect, which generalize its current 
> behavior: 
>  * support for LEFT JOIN UNNEST
>  * support for carrying over non-collection input fields to the output 
> unchanged
> The first capability is needed because the semantics of UNCOLLECT on the 
> right-hand-side of a LEFT JOIN cannot be implemented using the existing 
> semantics: UNCOLLECT will need to produce a NULL for empty or NULL 
> collections in such cases.
> The second capability is independent on the first one: it would strictly 
> increase the expressive power of Uncollect, allowing it to not only Unnest 
> collections, but also copy some input fields unchanged for every element of 
> an unnested collection.
> The main reason for the second capability is that it would enable us to 
> represent plans using SQL UNNEST without using Correlate nodes. A sub-plan 
> containing Project + Correlate + Uncollect + Values would become a single 
> Uncollect node. Neither the old nor the new decorrelator can actually 
> eliminate Correlate + Uncollect (there is no other representation to express 
> the same behavior). Using the new representation we can decorrelate many more 
> plans (perhaps all plans).
> Although these two changes are logically independent, I propose to make them 
> in a single combined PR, because they would both change the type checking and 
> runtime implementation of the operator, and it is desirable to minimize the 
> churn on consumers of the new operator.
> A temporary configuration flag would be introduced to control whether 
> SqlToRelConverter uses the new form of the operator, or just the old form. 
> Hopefully programs that only use the old form would remain unchanged. This 
> will give Calcite users time to upgrade to the new representation at their 
> own pace.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to