I have understood you and reproduced the problem. It looks like CoreRules.JOIN_TO_CORRELATE causes the decorrelation to fail. For this, I need to do more work to find out why. For your Case, you can just add these rules: CoreRules.FILTER_SUB_QUERY_TO_CORRELATE, CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE, CoreRules.JOIN_SUB_QUERY_TO_CORRELATE, CoreRules. FILTER_CORRELATE and try the decorrelation
> 2024年8月17日 07:47,Mihai Budiu <mbu...@gmail.com> 写道: > > The following RelOptRulesTest reproduces the problem: > > @Test void testDecorrelate() { > final String query = "SELECT 3 in (SELECT deptno FROM emp)"; > sql(query).withRule( > CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE, > CoreRules.JOIN_TO_CORRELATE) > .withLateDecorrelate(true) > .check(); > } > > This produces 3 plans: before, mid, after. > Before: > > LogicalProject(EXPR$0=[IN(3, { > LogicalProject(DEPTNO=[$7]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > })]) > LogicalValues(tuples=[[{ 0 }]]) > > Mid, after the two rules that I understand you have recommended: > > LogicalProject(EXPR$0=[IS NOT NULL($1)]) > LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}]) > LogicalValues(tuples=[[{ 0 }]]) > LogicalAggregate(group=[{0}]) > LogicalProject(cs=[true]) > LogicalFilter(condition=[=(3, $0)]) > LogicalProject(DEPTNO=[$7]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > After decorrelation; > > LogicalProject(EXPR$0=[IS NOT NULL($1)]) > LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}]) > LogicalValues(tuples=[[{ 0 }]]) > LogicalAggregate(group=[{0}]) > LogicalProject(cs=[true]) > LogicalProject(DEPTNO=[$7]) > LogicalFilter(condition=[=(3, $7)]) > LogicalTableScan(table=[[CATALOG, SALES, EMP]]) > > Notice that the decorrelation has not removed the LogicalCorrelate. > ________________________________ > From: suibianwanwan <1597226...@qq.com.INVALID> > Sent: Wednesday, August 14, 2024 7:59 PM > To: dev <dev@calcite.apache.org> > Subject: Re: Re:Query planner structure > > I'm trying to reproduce the problem, but the result of my convert Query > doesn't have the LogicalCorrelate. > > > So I tried to create a test in calcite org.apache.calcite.tools.PlannerTest > as follows: > > > Planner planner = getPlanner(null); > final String sql = "SELECT 3 in (SELECT \"deptno\" FROM > \"emps\")"; > SqlNode parse = planner.parse(sql); > SqlNode validate = planner.validate(parse); > RelNode convert = planner.rel(validate).rel; > > relnode as follows: > LogicalProject.NONE.[](input=LogicalValues#6,exprs=[IN(3, { > LogicalProject(deptno=[$1]) > LogicalTableScan(table=[[hr, emps]]) > })]) > > > Do you have a way to reproduce this in calcite? > > > Regards, > suibianwanwan > ------------------ Original ------------------ > From: > "dev" > > <mbu...@gmail.com>; > Date: Thu, Aug 15, 2024 06:23 AM > To: "dev"<dev@calcite.apache.org>; > > Subject: Re: Re:Query planner structure > > > I have tried this approach, but the decorrelator fails to decorrelate many > queries. Here is a simple query where after decorrelation there are still > LogicalCorrelate nodes left: > > SELECT 3 in (SELECT empno FROM emp) > > Maybe I am missing some optimization steps? > > From my reading of the code, the Calcite decorrelator seems to be a heuristic > decorrelator, which only handles a few types of patterns of correlated > subqueries. If your query doesn't fall into one of these patterns, the > decorrelator leaves it unchanged. > > For example, this query, after converting it to use LogicalCorrelate as you > have described, becomes: > > LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211 > LogicalCorrelate(correlation=[$cor0], > joinType=[left], requiredColumns=[{}]), id = 213 > LogicalValues(tuples=[[{ 0 }]]), > id = 180 > LogicalAggregate(group=[{0}]), id > = 207 > > LogicalProject(cs=[true]), id = 205 > > LogicalFilter(condition=[=(3, $0)]), id = 203 > > LogicalProject(EMPNO=[$0]), id = 201 > > LogicalTableScan(table=[[schema, EMP]]), id = 182 > > The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot decorrelate > this query. > > Do you have a solution that handles this case? > > BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.: > https://issues.apache.org/jira/browse/CALCITE-5627 > > Thank you, > Mihai > ________________________________ > From: suibianwanwan33 <suibianwanwa...@foxmail.com> > Sent: Tuesday, August 13, 2024 10:31 PM > To: dev <dev@calcite.apache.org> > Subject: Re:Query planner structure > > Hi, this is one of the classic optimization cases in my work. > First you need to add all CORRELATE optimizations such as > FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a > decorrelateQuery, and finally use a HepPlanner to add the JOIN_CONDITION_PUSH > and the FILTER_INTO_JOIN. my guess is that the Cartesian product of this > query will be optimized > > > > > ------------------&nbsp;Original&nbsp;------------------ > From: > > "dev" > <mbu...@gmail.com&gt;; > Date:&nbsp;Wed, Aug 14, 2024 12:51 PM > To:&nbsp;"dev@calcite.apache.org"<dev@calcite.apache.org&gt;; > > Subject:&nbsp;Query planner structure > > > Hello all, > > Calcite contains many optimization rules. It is not obvious to me how to > assemble them into a good planner. Can people recommend tested query planners > based on Calcite that are open-source and that can be used as a starting > point? > > In particular, I am having trouble making a HEP-planner produce a good plan > for Q21 from TPC-H; the planner I assembled produces many cartesian products. > > For reference, here is the variant of Q21 that I am using: > > create view q21 ( > &nbsp;&nbsp;&nbsp; s_name, > &nbsp;&nbsp;&nbsp; numwait > ) as > select > &nbsp;&nbsp;&nbsp; s_name, > &nbsp;&nbsp;&nbsp; count(*) as numwait > from > &nbsp;&nbsp;&nbsp; supplier, > &nbsp;&nbsp;&nbsp; lineitem l1, > &nbsp;&nbsp;&nbsp; orders, > &nbsp;&nbsp;&nbsp; nation > where > &nbsp;&nbsp;&nbsp; s_suppkey = l1.l_suppkey > &nbsp;&nbsp;&nbsp; and o_orderkey = l1.l_orderkey > &nbsp;&nbsp;&nbsp; and o_orderstatus = 'F' > &nbsp;&nbsp;&nbsp; and l1.l_receiptdate &gt; l1.l_commitdate > &nbsp;&nbsp;&nbsp; and exists ( > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > * > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > lineitem l2 > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > l2.l_orderkey = l1.l_orderkey > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > and l2.l_suppkey <&gt; l1.l_suppkey > &nbsp;&nbsp;&nbsp; ) > &nbsp;&nbsp;&nbsp; and not exists ( > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; select > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > * > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; from > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > lineitem l3 > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; where > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > l3.l_orderkey = l1.l_orderkey > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > and l3.l_suppkey <&gt; l1.l_suppkey > &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; > and l3.l_receiptdate &gt; l3.l_commitdate > &nbsp;&nbsp;&nbsp; ) > &nbsp;&nbsp;&nbsp; and s_nationkey = n_nationkey > &nbsp;&nbsp;&nbsp; and n_name = 'GERMANY' > group by > &nbsp;&nbsp;&nbsp; s_name > order by > &nbsp;&nbsp;&nbsp; numwait desc, > &nbsp;&nbsp;&nbsp; s_name > LIMIT 100; > > Thank you for any suggestions! > Mihai