I don’t see any problem with this proposal. But I’m too busy to give it serious thought. Can someone else please review?
> On Dec 1, 2022, at 1:23 PM, James Starr <[email protected]> wrote: > > Hi Julian, > > I want to propose changing calcite's RexSubQuery. So the SubQuery would > have up to N+M rex children(N is the number of correlated variables used > and M is the number of rex nodes for expression. > > SELECT t1.c1 IN ( > SELECT t2.c1 > FROM (VALUES (1, 2)) AS t2(c1, c2) > WHERE t1.c2 = t2.c2 > ) > FROM (VALUES (1, 2)) AS t1(c1, c2) > > Would result in a query that looks like > PROJECT ( ($0 IN {rel...}, $1) > VALUES(..) > > Where previously only $0 would have been a child of the RexSubQuery due to > being used in the IN clause, however, after the change, then both $0 and $1 > would be a child. When subqueries are evaluated or decorrelated, > then they would need to dereference their correlated variables through > their arguments instead of directly using the scope of the RelNode. This > would allow for more streamlined pass through logic manipulating > RexSubQueries, since their implicit correlated variable contract with a > RelNode is now explicit. For instance, in RelFieldTrimmer which currently > trims fields that are only used as correlated variables. RelFieldTrimmer > also does not correctly shift the offset of referenced correlated > variables. However, if the inputs of correlated variables are explicitly > called out as children of the RexSubQuery, then the fields would not be > trimmed as well as being shifted correctly. > > I hope this makes it clearer what I am proposing. > > James > > > On Thu, Dec 1, 2022 at 12:01 PM Julian Hyde <[email protected]> wrote: > >> I do agree that a correlated sub-query is a function call. If you write >> your queries using CROSS APPLY this becomes clear. >> >> Decorrelation is very useful. Some execution engines, especially the >> highly parallel/distributed ones, stopping and restarting subqueries >> requires a lot of communication. So Calcite supports decorrelation, and it >> is Calcite’s preferred execution strategy. But there are definitely >> engines, and queries, that are better executed in correlated form. >> >> By the way, the Froid project [1] takes this idea to the limit, and >> applies decorrelation techniques to function calls (creating ‘magic sets’ >> of all possible arguments). >> >> Calcite’s decorrelation code is old and brittle. But if I recall >> correctly, you don’t have to do decorrelation in SqlToRelConverter; you can >> defer, and do the decorrelation using planner rules. >> >> Julian >> >> [1] https://dl.acm.org/doi/10.1145/3186728.3164140 >> >> >>> On Dec 1, 2022, at 11:09 AM, James Starr <[email protected]> wrote: >>> >>> Currently sub-query correlated variables have a brittle contract with >>> their containing RelNode. Simple rules such as ones that transpose >>> filters and projects are unaware of this contract and would be >>> difficult to retrofit to handle all the rules to be sub-query aware. >>> >>> A correlated sub-query is logically a function call with where its >>> parameters are the values used for the correlated inputs. If the >>> SubQuery object was structured such that the inputs that are used as >>> correlated variables were explicit sub nodes of the sub-query object, >>> then most rules and utilities, such as the trimmer, would just work as >>> expected. SqlToRel could also be simplified since there would only be >>> one place to add the CorrelationId oppose to 3. >>> >>> James >> >>
