[ 
https://issues.apache.org/jira/browse/CALCITE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18090717#comment-18090717
 ] 

Julian Hyde commented on CALCITE-7608:
--------------------------------------

Yes. A few observations.

{{Uncollect}} could use a re-think. (Even though {{Uncollect}} is very old, and 
pre-dates Calcite entering ASF, I wasn't very much involved in its creation.) 

Note that we did not make {{Uncollect}} a logical operator. This was 
intentional. Maybe now is the time, but adding a logical operator is a big 
deal, like adding a new kind of piece to a chess board. You have to think about 
how it will interact with all the existing pieces (operators).

It would be a mistake to add an operator with the full power of {{flatMap}}, 
because algebraic rewrites depend on each operator doing just one thing. So we 
need an operator that is just short of that.

When naming the operator, we should choose carefully. We should not call it 
"flatMap" if it is less powerful than flatMap. "Select" is a word we avoid in 
Calcite operator names. If "Unnest" really is the best name, we can reuse it 
(and obsolete the old Unnest operator).

Relational algebra has a particular problem with nested relations. Unlike 
lambda-based languages like LINQ and Morel, nested relations are a different 
kind of value to regular relations, and relational operators are not ordinary 
functions. A special operator is required to convert nested to regular. (And 
another operator to go the other direction.)

Should this operator deal with correlated values? If so, what is its 
relationship with dependent joins?

So, those are my concerns. I'm not going to veto what you produce, but you need 
to produce a design and thinking to back it up, not just a PR that doesn't 
break any tests.

> Introduce a SelectMany operator
> -------------------------------
>
>                 Key: CALCITE-7608
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7608
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.42.0
>            Reporter: Mihai Budiu
>            Assignee: Mihai Budiu
>            Priority: Minor
>              Labels: pull-request-available
>
> Today UNNEST is implemented using the Uncollect operator. We propose adding 
> an alternative LogicalSelectMany operator, which generalizes Uncollect. 
> (Notice that Enumerable API already has a SelectMany.) The main difference 
> between Uncollect and SelectMany is that Uncollect unnests all the fields of 
> its input relation, whereas LogicalSelectMany would only unnest SOME of the 
> fields of the input collection, preserving the other ones in each output row.
> This distinction is very important, because:
>  * LogicalSelectMany can be directly and efficiently implemented using the 
> Enumerable SelectMany
>  * UNNEST used in a cross-join is implemented using an Uncollect and a 
> LogicalCorrelate. However, the same UNNEST can be represented using just one 
> LogicalSelectMany node
>  * Neither the old nor the new decorrelator can actually eliminate 
> LogicalCorrelate nodes that are paired with Uncollect. Using 
> LogicalSelectMany we can decorrelate many more plans.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to