I would also like to have the Subquery contain the correlated id instead of the RelNodes. This would allow almost all rules to just work on nodes with correlated subqueries in them with having to have special logic.
On Mon, Dec 5, 2022 at 6:20 PM Julian Hyde <[email protected]> wrote: > I don’t see any problem with this proposal. But I’m too busy to give it > serious thought. Can someone else please review? > > > On Dec 1, 2022, at 1:23 PM, James Starr <[email protected]> wrote: > > > > Hi Julian, > > > > I want to propose changing calcite's RexSubQuery. So the SubQuery would > > have up to N+M rex children(N is the number of correlated variables used > > and M is the number of rex nodes for expression. > > > > SELECT t1.c1 IN ( > > SELECT t2.c1 > > FROM (VALUES (1, 2)) AS t2(c1, c2) > > WHERE t1.c2 = t2.c2 > > ) > > FROM (VALUES (1, 2)) AS t1(c1, c2) > > > > Would result in a query that looks like > > PROJECT ( ($0 IN {rel...}, $1) > > VALUES(..) > > > > Where previously only $0 would have been a child of the RexSubQuery due > to > > being used in the IN clause, however, after the change, then both $0 and > $1 > > would be a child. When subqueries are evaluated or decorrelated, > > then they would need to dereference their correlated variables through > > their arguments instead of directly using the scope of the RelNode. This > > would allow for more streamlined pass through logic manipulating > > RexSubQueries, since their implicit correlated variable contract with a > > RelNode is now explicit. For instance, in RelFieldTrimmer which > currently > > trims fields that are only used as correlated variables. RelFieldTrimmer > > also does not correctly shift the offset of referenced correlated > > variables. However, if the inputs of correlated variables are explicitly > > called out as children of the RexSubQuery, then the fields would not be > > trimmed as well as being shifted correctly. > > > > I hope this makes it clearer what I am proposing. > > > > James > > > > > > On Thu, Dec 1, 2022 at 12:01 PM Julian Hyde <[email protected]> > wrote: > > > >> I do agree that a correlated sub-query is a function call. If you write > >> your queries using CROSS APPLY this becomes clear. > >> > >> Decorrelation is very useful. Some execution engines, especially the > >> highly parallel/distributed ones, stopping and restarting subqueries > >> requires a lot of communication. So Calcite supports decorrelation, and > it > >> is Calcite’s preferred execution strategy. But there are definitely > >> engines, and queries, that are better executed in correlated form. > >> > >> By the way, the Froid project [1] takes this idea to the limit, and > >> applies decorrelation techniques to function calls (creating ‘magic > sets’ > >> of all possible arguments). > >> > >> Calcite’s decorrelation code is old and brittle. But if I recall > >> correctly, you don’t have to do decorrelation in SqlToRelConverter; you > can > >> defer, and do the decorrelation using planner rules. > >> > >> Julian > >> > >> [1] https://dl.acm.org/doi/10.1145/3186728.3164140 > >> > >> > >>> On Dec 1, 2022, at 11:09 AM, James Starr <[email protected]> wrote: > >>> > >>> Currently sub-query correlated variables have a brittle contract with > >>> their containing RelNode. Simple rules such as ones that transpose > >>> filters and projects are unaware of this contract and would be > >>> difficult to retrofit to handle all the rules to be sub-query aware. > >>> > >>> A correlated sub-query is logically a function call with where its > >>> parameters are the values used for the correlated inputs. If the > >>> SubQuery object was structured such that the inputs that are used as > >>> correlated variables were explicit sub nodes of the sub-query object, > >>> then most rules and utilities, such as the trimmer, would just work as > >>> expected. SqlToRel could also be simplified since there would only be > >>> one place to add the CorrelationId oppose to 3. > >>> > >>> James > >> > >> > >
