Hi, Julian Thanks for your reply!
> I believe that variables can only be set in the current RelNode. (Read a row > from input, set the variable, then evaluate a Rex expression or restart the > right input. It’s like a ‘for’ loop.) Actually, this is in line with my expectations. So, I'll file a ticket for this. -- Regards, Konstantin Orlov > On 17 Sep 2021, at 21:44, Julian Hyde <[email protected]> wrote: > > The sentence "Note: only {@link org.apache.calcite.rel.core.Correlate} should > set variables.” is no longer true, now we added correlated Filter and, I > believe, correlated Project. Maybe we should also add correlated Join (in > case the ON clause uses correlated variables). > > I believe that variables can only be set in the current RelNode. (Read a row > from input, set the variable, then evaluate a Rex expression or restart the > right input. It’s like a ‘for’ loop.) In which case, what you are seeing is > wrong. But I’m not 100% sure. > > Note that, unlike *set*, variables can be *used* anywhere within a tree > (generally in the right-hand input of a Correlate). > > Maybe you could propose better javadoc. That is worth doing independent of > any bugs that you are trying to fix. > > Julian > > >> On Sep 17, 2021, at 5:43 AM, Konstantin Orlov <[email protected]> wrote: >> >> Hi, folks >> >> I have a question about org.apache.calcite.rel.RelNode#getVariablesSet. >> Javadoc says, it returns variables that are set by current node: >> >> /** >> * Returns the variables that are set in this relational >> * expression but also used and therefore not available to parents of this >> * relational expression. >> * >> * <p>Note: only {@link org.apache.calcite.rel.core.Correlate} should set >> * variables. >> * >> * @return Names of variables which are set in this relational >> * expression >> */ >> Set<CorrelationId> getVariablesSet(); >> >> >> But I've got a plan where node returns all variables used by children nodes >> regardless this variable are set by current or parent node. >> >> Original query is: >> >> SELECT * >> FROM t1 as "outer" >> WHERE a > ( >> SELECT COUNT(*) >> FROM t1 as "inner" >> WHERE "inner".a IN ( >> SELECT * >> FROM table(system_range("inner".a, "inner".b + "outer".b)) >> ) >> ) >> >> After SQL to Rel translation I've got plan as follow: >> >> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6]) >> LogicalFilter(condition=[>($2, $SCALAR_QUERY({ >> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()]) >> LogicalFilter(condition=[IN($2, { >> LogicalProject(X=[$0]) >> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, >> +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)]) >> })], variablesSet=[[$cor0]]) >> LogicalTableScan(table=[[PUBLIC, T1]]) >> }))], variablesSet=[[$cor2]]) >> LogicalTableScan(table=[[PUBLIC, T1]]) >> >> Every LogicalFilter introduce its own correlation variable, and everything is >> OK so far. >> >> But then I apply SubQueryRemoveRule and new plan looks like this: >> >> LogicalProject(A=[$2], B=[$3], C=[$4], D=[$5], E=[$6]) >> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], E=[$6]) >> LogicalFilter(condition=[>($2, $7)]) >> LogicalCorrelate(correlation=[$cor2], joinType=[left], >> requiredColumns=[{3}]) >> LogicalTableScan(table=[[PUBLIC, T1]]) >> LogicalAggregate(group=[{}], COUNT(*)=[COUNT()]) >> LogicalProject(_KEY=[$0], _VAL=[$1], A=[$2], B=[$3], C=[$4], D=[$5], >> E=[$6]) >> LogicalJoin(condition=[=($2, $7)], joinType=[inner]) >> LogicalTableScan(table=[[PUBLIC, T1]]) >> LogicalAggregate(group=[{0}]) >> LogicalProject(X=[$0]) >> LogicalTableFunctionScan(invocation=[SYSTEM_RANGE($cor0.A, >> +($cor0.B, $cor2.B))], rowType=[RecordType(BIGINT X)]) >> >> >> At this point LogicalJoin.getVariablesSet() returns both "cor0" and "cor2" >> variables which doesn't seem right. >> >> Is such behaviour expected or it is a bug? >> >> -- >> Regards, >> Konstantin Orlov >> >> >
