Re: Structuring SubQueries as Functions

James Starr Thu, 01 Dec 2022 13:23:25 -0800

Hi Julian,

I want to propose changing calcite's RexSubQuery.  So the SubQuery would
have up to N+M rex children(N is the number of correlated variables used
and M is the number of rex nodes for expression.

SELECT t1.c1 IN (
  SELECT t2.c1
  FROM (VALUES (1, 2)) AS t2(c1, c2)
  WHERE t1.c2 = t2.c2
)
FROM (VALUES (1, 2)) AS t1(c1, c2)

Would result in a query that looks like
PROJECT ( ($0 IN {rel...}, $1)
  VALUES(..)

Where previously only $0 would have been a child of the RexSubQuery due to
being used in the IN clause, however, after the change, then both $0 and $1
would be a child.  When subqueries are evaluated or decorrelated,
then they would need to dereference their correlated variables through
their arguments instead of directly using the scope of the RelNode.  This
would allow for more streamlined pass through logic manipulating
RexSubQueries, since their implicit correlated variable contract with a
RelNode is now explicit.  For instance, in RelFieldTrimmer which currently
trims fields that are only used as correlated variables.  RelFieldTrimmer
also does not correctly shift the offset of referenced correlated
variables.  However, if the inputs of correlated variables are explicitly
called out as children of the RexSubQuery, then the fields would not be
trimmed as well as being shifted correctly.

I hope this makes it clearer what I am proposing.

James

On Thu, Dec 1, 2022 at 12:01 PM Julian Hyde <[email protected]> wrote:

> I do agree that a correlated sub-query is a function call. If you write
> your queries using CROSS APPLY this becomes clear.
>
> Decorrelation is very useful. Some execution engines, especially the
> highly parallel/distributed ones, stopping and restarting subqueries
> requires a lot of communication. So Calcite supports decorrelation, and it
> is Calcite’s preferred execution strategy. But there are definitely
> engines, and queries, that are better executed in correlated form.
>
> By the way, the Froid project [1] takes this idea to the limit, and
> applies decorrelation techniques to function calls (creating ‘magic sets’
> of all possible arguments).
>
> Calcite’s decorrelation code is old and brittle. But if I recall
> correctly, you don’t have to do decorrelation in SqlToRelConverter; you can
> defer, and do the decorrelation using planner rules.
>
> Julian
>
> [1] https://dl.acm.org/doi/10.1145/3186728.3164140
>
>
> > On Dec 1, 2022, at 11:09 AM, James Starr <[email protected]> wrote:
> >
> > Currently sub-query correlated variables have a brittle contract with
> > their containing RelNode.  Simple rules such as ones that transpose
> > filters and projects are unaware of this contract and would be
> > difficult to retrofit to handle all the rules to be sub-query aware.
> >
> > A correlated sub-query is logically a function call with where its
> > parameters are the values used for the correlated inputs.  If the
> > SubQuery object was structured such that the inputs that are used as
> > correlated variables were explicit sub nodes of the sub-query object,
> > then most rules and utilities, such as the trimmer, would just work as
> > expected.  SqlToRel could also be simplified since there would only be
> > one place to add the CorrelationId oppose to 3.
> >
> > James
>
>

Re: Structuring SubQueries as Functions

Reply via email to