[
https://issues.apache.org/jira/browse/CALCITE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Mihai Budiu updated CALCITE-7608:
---------------------------------
Description:
Today SQL UNNEST is implemented using the Uncollect operator. We propose adding
two additional capabilities to Uncollect, which generalize its current
behavior:
* support for LEFT JOIN UNNEST
* support for carrying over non-collection input fields to the output unchanged
The first capability is needed because the semantics of UNCOLLECT on the
right-hand-side of a LEFT JOIN cannot be implemented using the existing
semantics: UNCOLLECT will need to produce a NULL for empty or NULL collections
in such cases.
The second capability is independent on the first one: it would strictly
increase the expressive power of Uncollect, allowing it to not only Unnest
collections, but also copy some input fields unchanged for every element of an
unnested collection.
The main reason for the second capability is that it would enable us to
represent plans using SQL UNNEST without using Correlate nodes. A sub-plan
containing Project + Correlate + Uncollect + Values would become a single
Uncollect node. Neither the old nor the new decorrelator can actually eliminate
Correlate + Uncollect (there is no other representation to express the same
behavior). Using the new representation we can decorrelate many more plans
(perhaps all plans).
Although these two changes are logically independent, I propose to make them in
a single combined PR, because they would both change the type checking and
runtime implementation of the operator, and it is desirable to minimize the
churn on consumers of the new operator.
A temporary configuration flag would be introduced to control whether
SqlToRelConverter uses the new form of the operator, or just the old form.
Hopefully programs that only use the old form would remain unchanged. This will
give Calcite users time to upgrade to the new representation at their own pace.
was:
Today UNNEST is implemented using the Uncollect operator. We propose adding an
alternative LogicalSelectMany operator, which generalizes Uncollect. (Notice
that Enumerable API already has a SelectMany.) The main difference between
Uncollect and SelectMany is that Uncollect unnests all the fields of its input
relation, whereas LogicalSelectMany would only unnest SOME of the fields of the
input collection, preserving the other ones in each output row.
This distinction is very important, because:
* LogicalSelectMany can be directly and efficiently implemented using the
Enumerable SelectMany
* UNNEST used in a cross-join is implemented using an Uncollect and a
LogicalCorrelate. However, the same UNNEST can be represented using just one
LogicalSelectMany node
* Neither the old nor the new decorrelator can actually eliminate
LogicalCorrelate nodes that are paired with Uncollect. Using LogicalSelectMany
we can decorrelate many more plans.
> Enhance Uncollect
> ------------------
>
> Key: CALCITE-7608
> URL: https://issues.apache.org/jira/browse/CALCITE-7608
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.42.0
> Reporter: Mihai Budiu
> Assignee: Mihai Budiu
> Priority: Minor
> Labels: pull-request-available
>
> Today SQL UNNEST is implemented using the Uncollect operator. We propose
> adding two additional capabilities to Uncollect, which generalize its current
> behavior:
> * support for LEFT JOIN UNNEST
> * support for carrying over non-collection input fields to the output
> unchanged
> The first capability is needed because the semantics of UNCOLLECT on the
> right-hand-side of a LEFT JOIN cannot be implemented using the existing
> semantics: UNCOLLECT will need to produce a NULL for empty or NULL
> collections in such cases.
> The second capability is independent on the first one: it would strictly
> increase the expressive power of Uncollect, allowing it to not only Unnest
> collections, but also copy some input fields unchanged for every element of
> an unnested collection.
> The main reason for the second capability is that it would enable us to
> represent plans using SQL UNNEST without using Correlate nodes. A sub-plan
> containing Project + Correlate + Uncollect + Values would become a single
> Uncollect node. Neither the old nor the new decorrelator can actually
> eliminate Correlate + Uncollect (there is no other representation to express
> the same behavior). Using the new representation we can decorrelate many more
> plans (perhaps all plans).
> Although these two changes are logically independent, I propose to make them
> in a single combined PR, because they would both change the type checking and
> runtime implementation of the operator, and it is desirable to minimize the
> churn on consumers of the new operator.
> A temporary configuration flag would be introduced to control whether
> SqlToRelConverter uses the new form of the operator, or just the old form.
> Hopefully programs that only use the old form would remain unchanged. This
> will give Calcite users time to upgrade to the new representation at their
> own pace.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)