[ 
https://issues.apache.org/jira/browse/CALCITE-2028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16235159#comment-16235159
 ] 

Liao Xintao commented on CALCITE-2028:
--------------------------------------

I try to fix it by re-getting the "variablesSet" from the subquery, instead of 
that getting from the Filter directly, and it works.
But I'm not sure that something more should be done. 

> SubQueryRemoveRule should create Join, not Correlate, for un-correlated 
> sub-queries
> -----------------------------------------------------------------------------------
>
>                 Key: CALCITE-2028
>                 URL: https://issues.apache.org/jira/browse/CALCITE-2028
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Liao Xintao
>            Assignee: Julian Hyde
>            Priority: Normal
>              Labels: sub-query
>
> I'm doing some work about running TPC-H queries on Flink, and encounter a 
> problem that q20 can't be de-correlated appropriately when 
> ConfigBuilder.withExpand(false).
> TPC-H q20 query has a WHERE clause composed of an IN predicate with an 
> un-correlated subquery and an EXISTS predicate with a correlated subquery.
> Therefore, in SubQueryRemoveRule.FILTER, the "variablesSet" from the entire 
> Filter rel, referenced by the correlated subquery of EXISTS predicate 
> actually,  applies to the un-correlated subquery of IN predicate, and finally 
> generates an inner-join LogicalCorrelate which will be ignored in 
> RelDecorrelator.
> # The q20 query:
> {code: sql}
> -- tpch20
> select
>   s.s_name,
>   s.s_address
> from
>   supplier s,
>   nation n
> where
>   s.s_suppkey in (
>     select
>       ps.ps_suppkey
>     from
>       partsupp ps
>     where
>       ps. ps_partkey in (
>         select
>           p.p_partkey
>         from
>           part p
>         where
>           p.p_name like 'antique%'
>       )
>       and ps.ps_availqty > (
>         select
>           0.5 * sum(l.l_quantity)
>         from
>           lineitem l
>         where
>           l.l_partkey = ps.ps_partkey
>           and l.l_suppkey = ps.ps_suppkey
>           and l.l_shipdate >= date '1993-01-01'
>           and l.l_shipdate < date '1993-01-01' + interval '1' year
>       )
>   )
>   and s.s_nationkey = n.n_nationkey
>   and n.n_name = 'KENYA'
> order by
>   s.s_name
> {code}
> * Plan before SubQueryRemoveRule.FILTER:
> {code}
> LogicalSort(sort0=[$0], dir0=[ASC])
>   LogicalProject(s_name=[$1], s_address=[$2])
>     LogicalFilter(condition=[AND(IN($0, {
>               LogicalProject(ps_suppkey=[$1])
>                 LogicalFilter(condition=[AND(IN($0, {
>                               LogicalProject(p_partkey=[$0])
>                                 LogicalFilter(condition=[LIKE($1, 
> 'antique%')])
>                                   LogicalTableScan(table=[[part, source: 
> [selectedFields=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, 
> p_container, p_retailprice, p_comment]]]])
>                               }), >($2, $SCALAR_QUERY({
>                               LogicalProject(EXPR$0=[*(0.5, $0)])
>                                 LogicalAggregate(group=[{}], agg#0=[SUM($0)])
>                                   LogicalProject(l_quantity=[$4])
>                                     LogicalFilter(condition=[AND(=($1, 
> $cor0.ps_partkey), =($2, $cor0.ps_suppkey), >=($10, 1993-01-01), <($10, 
> DATETIME_PLUS(1993-01-01, 12)))])
>                                       LogicalTableScan(table=[[lineitem, 
> source: [selectedFields=[l_orderkey, l_partkey, l_suppkey, l_linenumber, 
> l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, 
> l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, 
> l_comment]]]])
>                               })))], variablesSet=[[$cor0]])
>                   LogicalTableScan(table=[[partsupp, source: 
> [selectedFields=[ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, 
> ps_comment]]]])
>               }), =($3, $7), =($8, 'KENYA'))])
>       LogicalJoin(condition=[true], joinType=[inner])
>         LogicalTableScan(table=[[supplier, source: 
> [selectedFields=[s_suppkey, s_name, s_address, s_nationkey, s_phone, 
> s_acctbal, s_comment]]]])
>         LogicalTableScan(table=[[nation, source: 
> [selectedFields=[n_nationkey, n_name, n_regionkey, n_comment]]]])
> {code}
> * Plan after SubQueryRemoveRule.FILTER (also differs from the plan when 
> ConfigBuilder.withExpand(true)):
> {code}
> LogicalSort(sort0=[$0], dir0=[ASC])
>   LogicalProject(s_name=[$1], s_address=[$2])
>     LogicalProject(s_suppkey=[$0], s_name=[$1], s_address=[$2], 
> s_nationkey=[$3], s_phone=[$4], s_acctbal=[$5], s_comment=[$6], 
> n_nationkey=[$7], n_name=[$8], n_regionkey=[$9], n_comment=[$10])
>       LogicalFilter(condition=[AND(=($3, $7), =($8, 'KENYA'))])
>         LogicalJoin(condition=[=($0, $11)], joinType=[inner])
>           LogicalJoin(condition=[true], joinType=[inner])
>             LogicalTableScan(table=[[supplier, source: 
> [selectedFields=[s_suppkey, s_name, s_address, s_nationkey, s_phone, 
> s_acctbal, s_comment]]]])
>             LogicalTableScan(table=[[nation, source: 
> [selectedFields=[n_nationkey, n_name, n_regionkey, n_comment]]]])
>           LogicalAggregate(group=[{0}])
>             LogicalProject(ps_suppkey=[$1])
>               LogicalProject(ps_partkey=[$0], ps_suppkey=[$1], 
> ps_availqty=[$2], ps_supplycost=[$3], ps_comment=[$4])
>                 LogicalFilter(condition=[>($2, $6)])
>                   LogicalCorrelate(correlation=[$cor0], joinType=[left], 
> requiredColumns=[{0, 1}])
>                     LogicalFilter(condition=[=($0, $5)])
>                       LogicalCorrelate(correlation=[$cor0], joinType=[inner], 
> requiredColumns=[{}])   <<<--- the problem is here!
>                         LogicalTableScan(table=[[partsupp, source: 
> [selectedFields=[ps_partkey, ps_suppkey, ps_availqty, ps_supplycost, 
> ps_comment]]]])
>                         LogicalAggregate(group=[{0}])
>                           LogicalProject(p_partkey=[$0])
>                             LogicalFilter(condition=[LIKE($1, 'antique%')])
>                               LogicalTableScan(table=[[part, source: 
> [selectedFields=[p_partkey, p_name, p_mfgr, p_brand, p_type, p_size, 
> p_container, p_retailprice, p_comment]]]])
>                     LogicalAggregate(group=[{}], agg#0=[SINGLE_VALUE($0)])
>                       LogicalProject(EXPR$0=[*(0.5, $0)])
>                         LogicalAggregate(group=[{}], agg#0=[SUM($0)])
>                           LogicalProject(l_quantity=[$4])
>                             LogicalFilter(condition=[AND(=($1, 
> $cor0.ps_partkey), =($2, $cor0.ps_suppkey), >=($10, 1993-01-01), <($10, 
> DATETIME_PLUS(1993-01-01, 12)))])
>                               LogicalTableScan(table=[[lineitem, source: 
> [selectedFields=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, 
> l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, 
> l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment]]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to