I have understood you and reproduced the problem. It looks like 
CoreRules.JOIN_TO_CORRELATE causes the decorrelation to fail. For this, I need 
to do more work to find out why.
For your Case, you can just add these rules:
CoreRules.FILTER_SUB_QUERY_TO_CORRELATE, 
CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE, 
CoreRules.JOIN_SUB_QUERY_TO_CORRELATE, CoreRules. FILTER_CORRELATE 
and try the decorrelation

> 2024年8月17日 07:47,Mihai Budiu <mbu...@gmail.com> 写道:
> 
> The following RelOptRulesTest reproduces the problem:
> 
>  @Test void testDecorrelate() {
>    final String query = "SELECT 3 in (SELECT deptno FROM emp)";
>    sql(query).withRule(
>        CoreRules.PROJECT_SUB_QUERY_TO_CORRELATE,
>            CoreRules.JOIN_TO_CORRELATE)
>        .withLateDecorrelate(true)
>        .check();
>  }
> 
> This produces 3 plans: before, mid, after.
> Before:
> 
> LogicalProject(EXPR$0=[IN(3, {
> LogicalProject(DEPTNO=[$7])
>  LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> })])
>  LogicalValues(tuples=[[{ 0 }]])
> 
> Mid, after the two rules that I understand you have recommended:
> 
> LogicalProject(EXPR$0=[IS NOT NULL($1)])
>  LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
>    LogicalValues(tuples=[[{ 0 }]])
>    LogicalAggregate(group=[{0}])
>      LogicalProject(cs=[true])
>        LogicalFilter(condition=[=(3, $0)])
>          LogicalProject(DEPTNO=[$7])
>            LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> 
> After decorrelation;
> 
> LogicalProject(EXPR$0=[IS NOT NULL($1)])
>  LogicalCorrelate(correlation=[$cor0], joinType=[left], requiredColumns=[{}])
>    LogicalValues(tuples=[[{ 0 }]])
>    LogicalAggregate(group=[{0}])
>      LogicalProject(cs=[true])
>        LogicalProject(DEPTNO=[$7])
>          LogicalFilter(condition=[=(3, $7)])
>            LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> 
> Notice that the decorrelation has not removed the LogicalCorrelate.
> ________________________________
> From: suibianwanwan <1597226...@qq.com.INVALID>
> Sent: Wednesday, August 14, 2024 7:59 PM
> To: dev <dev@calcite.apache.org>
> Subject: Re: Re:Query planner structure
> 
> I'm trying to reproduce the problem, but the result of my convert Query 
> doesn't have the LogicalCorrelate.
> 
> 
> So I tried to create a test in calcite org.apache.calcite.tools.PlannerTest 
> as follows:
> 
> 
> &nbsp; &nbsp; Planner planner = getPlanner(null);
> &nbsp; &nbsp; final String sql = "SELECT 3 in (SELECT \"deptno\" FROM 
> \"emps\")";
> &nbsp; &nbsp; SqlNode parse = planner.parse(sql);
> &nbsp; &nbsp; SqlNode validate = planner.validate(parse);
> &nbsp; &nbsp; RelNode convert = planner.rel(validate).rel;
> 
> relnode as follows:
> LogicalProject.NONE.[](input=LogicalValues#6,exprs=[IN(3, {
> LogicalProject(deptno=[$1])
> &nbsp; LogicalTableScan(table=[[hr, emps]])
> })])
> 
> 
> Do you have a way to reproduce this in calcite?
> 
> 
> Regards,
> suibianwanwan
> ------------------&nbsp;Original&nbsp;------------------
> From:                                                                         
>                                                "dev"                          
>                                                           
> <mbu...@gmail.com&gt;;
> Date:&nbsp;Thu, Aug 15, 2024 06:23 AM
> To:&nbsp;"dev"<dev@calcite.apache.org&gt;;
> 
> Subject:&nbsp;Re: Re:Query planner structure
> 
> 
> I have tried this approach, but the decorrelator fails to decorrelate many 
> queries. Here is a simple query where after decorrelation there are still 
> LogicalCorrelate nodes left:
> 
> SELECT 3 in (SELECT empno FROM emp)
> 
> Maybe I am missing some optimization steps?
> 
> From my reading of the code, the Calcite decorrelator seems to be a heuristic 
> decorrelator, which only handles a few types of patterns of correlated 
> subqueries. If your query doesn't fall into one of these patterns, the 
> decorrelator leaves it unchanged.
> 
> For example, this query, after converting it to use LogicalCorrelate as you 
> have described, becomes:
> 
> &nbsp;&nbsp;&nbsp; LogicalProject(EXPR$0=[IS NOT NULL($1)]), id = 211
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalCorrelate(correlation=[$cor0], 
> joinType=[left], requiredColumns=[{}]), id = 213
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalValues(tuples=[[{ 0 }]]), 
> id = 180
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LogicalAggregate(group=[{0}]), id 
> = 207
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
> LogicalProject(cs=[true]), id = 205
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; 
> LogicalFilter(condition=[=(3, $0)]), id = 203
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  LogicalProject(EMPNO=[$0]), id = 201
> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  LogicalTableScan(table=[[schema, EMP]]), id = 182
> 
> The call to RelDecorrelator.decorrelateQuery(rel, builder) cannot decorrelate 
> this query.
> 
> Do you have a solution that handles this case?
> 
> BTW: I also think that the FILTER_INTO_JOIN rule is unsound , e.g.: 
> https://issues.apache.org/jira/browse/CALCITE-5627
> 
> Thank you,
> Mihai
> ________________________________
> From: suibianwanwan33 <suibianwanwa...@foxmail.com&gt;
> Sent: Tuesday, August 13, 2024 10:31 PM
> To: dev <dev@calcite.apache.org&gt;
> Subject: Re:Query planner structure
> 
> Hi, this is one of the classic optimization cases in my work.
> First you need to add all CORRELATE optimizations such as 
> FILTER_SUB_QUERY_TO_CORRELATE in a HepPlanner, after that you need to do a 
> decorrelateQuery, and finally use a HepPlanner to add the JOIN_CONDITION_PUSH 
> and the FILTER_INTO_JOIN. my guess is that the Cartesian product of this 
> query will be optimized
> 
> 
> 
> 
> ------------------&amp;nbsp;Original&amp;nbsp;------------------
> From:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  
> "dev"&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
>  <mbu...@gmail.com&amp;gt;;
> Date:&amp;nbsp;Wed, Aug 14, 2024 12:51 PM
> To:&amp;nbsp;"dev@calcite.apache.org"<dev@calcite.apache.org&amp;gt;;
> 
> Subject:&amp;nbsp;Query planner structure
> 
> 
> Hello all,
> 
> Calcite contains many optimization rules. It is not obvious to me how to 
> assemble them into a good planner. Can people recommend tested query planners 
> based on Calcite that are open-source and that can be used as a starting 
> point?
> 
> In particular, I am having trouble making a HEP-planner produce a good plan 
> for Q21 from TPC-H; the planner I assembled produces many cartesian products.
> 
> For reference, here is the variant of Q21 that I am using:
> 
> create view q21 (
> &amp;nbsp;&amp;nbsp;&amp;nbsp; s_name,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; numwait
> ) as
> select
> &amp;nbsp;&amp;nbsp;&amp;nbsp; s_name,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; count(*) as numwait
> from
> &amp;nbsp;&amp;nbsp;&amp;nbsp; supplier,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; lineitem l1,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; orders,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; nation
> where
> &amp;nbsp;&amp;nbsp;&amp;nbsp; s_suppkey = l1.l_suppkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and o_orderkey = l1.l_orderkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and o_orderstatus = 'F'
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and l1.l_receiptdate &amp;gt; l1.l_commitdate
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and exists (
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  *
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  lineitem l2
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  l2.l_orderkey = l1.l_orderkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  and l2.l_suppkey <&amp;gt; l1.l_suppkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp; )
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and not exists (
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; select
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  *
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; from
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  lineitem l3
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; where
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  l3.l_orderkey = l1.l_orderkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  and l3.l_suppkey <&amp;gt; l1.l_suppkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;
>  and l3.l_receiptdate &amp;gt; l3.l_commitdate
> &amp;nbsp;&amp;nbsp;&amp;nbsp; )
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and s_nationkey = n_nationkey
> &amp;nbsp;&amp;nbsp;&amp;nbsp; and n_name = 'GERMANY'
> group by
> &amp;nbsp;&amp;nbsp;&amp;nbsp; s_name
> order by
> &amp;nbsp;&amp;nbsp;&amp;nbsp; numwait desc,
> &amp;nbsp;&amp;nbsp;&amp;nbsp; s_name
> LIMIT 100;
> 
> Thank you for any suggestions!
> Mihai

Reply via email to