Re: Structuring SubQueries as Functions

Julian Hyde Mon, 05 Dec 2022 18:20:37 -0800

I don’t see any problem with this proposal. But I’m too busy to give it serious 
thought. Can someone else please review?


> On Dec 1, 2022, at 1:23 PM, James Starr <[email protected]> wrote:
> 
> Hi Julian,
> 
> I want to propose changing calcite's RexSubQuery.  So the SubQuery would
> have up to N+M rex children(N is the number of correlated variables used
> and M is the number of rex nodes for expression.
> 
> SELECT t1.c1 IN (
>  SELECT t2.c1
>  FROM (VALUES (1, 2)) AS t2(c1, c2)
>  WHERE t1.c2 = t2.c2
> )
> FROM (VALUES (1, 2)) AS t1(c1, c2)
> 
> Would result in a query that looks like
> PROJECT ( ($0 IN {rel...}, $1)
>  VALUES(..)
> 
> Where previously only $0 would have been a child of the RexSubQuery due to
> being used in the IN clause, however, after the change, then both $0 and $1
> would be a child.  When subqueries are evaluated or decorrelated,
> then they would need to dereference their correlated variables through
> their arguments instead of directly using the scope of the RelNode.  This
> would allow for more streamlined pass through logic manipulating
> RexSubQueries, since their implicit correlated variable contract with a
> RelNode is now explicit.  For instance, in RelFieldTrimmer which currently
> trims fields that are only used as correlated variables.  RelFieldTrimmer
> also does not correctly shift the offset of referenced correlated
> variables.  However, if the inputs of correlated variables are explicitly
> called out as children of the RexSubQuery, then the fields would not be
> trimmed as well as being shifted correctly.
> 
> I hope this makes it clearer what I am proposing.
> 
> James
> 
> 
> On Thu, Dec 1, 2022 at 12:01 PM Julian Hyde <[email protected]> wrote:
> 
>> I do agree that a correlated sub-query is a function call. If you write
>> your queries using CROSS APPLY this becomes clear.
>> 
>> Decorrelation is very useful. Some execution engines, especially the
>> highly parallel/distributed ones, stopping and restarting subqueries
>> requires a lot of communication. So Calcite supports decorrelation, and it
>> is Calcite’s preferred execution strategy. But there are definitely
>> engines, and queries, that are better executed in correlated form.
>> 
>> By the way, the Froid project [1] takes this idea to the limit, and
>> applies decorrelation techniques to function calls (creating ‘magic sets’
>> of all possible arguments).
>> 
>> Calcite’s decorrelation code is old and brittle. But if I recall
>> correctly, you don’t have to do decorrelation in SqlToRelConverter; you can
>> defer, and do the decorrelation using planner rules.
>> 
>> Julian
>> 
>> [1] https://dl.acm.org/doi/10.1145/3186728.3164140
>> 
>> 
>>> On Dec 1, 2022, at 11:09 AM, James Starr <[email protected]> wrote:
>>> 
>>> Currently sub-query correlated variables have a brittle contract with
>>> their containing RelNode.  Simple rules such as ones that transpose
>>> filters and projects are unaware of this contract and would be
>>> difficult to retrofit to handle all the rules to be sub-query aware.
>>> 
>>> A correlated sub-query is logically a function call with where its
>>> parameters are the values used for the correlated inputs.  If the
>>> SubQuery object was structured such that the inputs that are used as
>>> correlated variables were explicit sub nodes of the sub-query object,
>>> then most rules and utilities, such as the trimmer, would just work as
>>> expected.  SqlToRel could also be simplified since there would only be
>>> one place to add the CorrelationId oppose to 3.
>>> 
>>> James
>> 
>>

Re: Structuring SubQueries as Functions

Reply via email to