[
https://issues.apache.org/jira/browse/CALCITE-4981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Brandon Chong updated CALCITE-4981:
-----------------------------------
Issue Type: Improvement (was: Bug)
> tpch q21 query plan has many cartesian joins after decorrelation
> ----------------------------------------------------------------
>
> Key: CALCITE-4981
> URL: https://issues.apache.org/jira/browse/CALCITE-4981
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.27.0
> Reporter: Yifei Yang
> Priority: Major
>
> Hi, I found the query plan of tpch q21 has 6 cartesian joins (totally 18
> joins) after calling the decorrelator. It's first reported
> [there|https://issues.apache.org/jira/browse/CALCITE-320], but it seems still
> an issue in 1.27.0. I'm not sure if it's a bug of Calcite or I need to
> incorporate some additional rules to fix it. Thank you!
> query (tpch q21):
> {quote}{{select}}
> {{ s.s_name,}}
> {{ count(*) as numwait}}
> {{from}}
> {{ supplier s,}}
> {{ lineitem l1,}}
> {{ orders o,}}
> {{ nation n}}
> {{where}}
> {{ s.s_suppkey = l1.l_suppkey}}
> {{ and o.o_orderkey = l1.l_orderkey}}
> {{ and o.o_orderstatus = 'F'}}
> {{ and l1.l_receiptdate > l1.l_commitdate}}
> {{ and exists (}}
> {{ select}}
> {{ *}}
> {{ from}}
> {{ lineitem l2}}
> {{ where}}
> {{ l2.l_orderkey = l1.l_orderkey}}
> {{ and l2.l_suppkey <> l1.l_suppkey}}
> {{ )}}
> {{ and not exists (}}
> {{ select}}
> {{ *}}
> {{ from}}
> {{ lineitem l3}}
> {{ where}}
> {{ l3.l_orderkey = l1.l_orderkey}}
> {{ and l3.l_suppkey <> l1.l_suppkey}}
> {{ and l3.l_receiptdate > l3.l_commitdate}}
> {{ )}}
> {{ and s.s_nationkey = n.n_nationkey}}
> {{ and n.n_name = 'BRAZIL'}}
> {{group by}}
> {{ s.s_name}}
> {{order by}}
> {{ numwait desc,}}
> {{ s.s_name}}
> {{limit 100}}
> {quote}
> Plan generated:
> {quote}EnumerableProject(S_NAME=[$0], NUMWAIT=[$1])
> EnumerableLimitSort(sort0=[$1], sort1=[$0], dir0=[DESC], dir1=[ASC],
> fetch=[100])
> EnumerableAggregate(group=[\\{0}], NUMWAIT=[COUNT()])
> EnumerableProject(s_name=[$3])
> EnumerableFilter(condition=[IS NULL($39)])
> EnumerableHashJoin(condition=[AND(=($11, $37), =($15, $38))],
> joinType=[left])
> EnumerableProject(s_comment=[$0], s_phone=[$1], s_nationkey=[$2],
> s_name=[$3], s_address=[$4], s_acctbal=[$5], s_suppkey=[$6],
> l_returnflag=[$7], l_receiptdate=[$8], l_tax=[$9], l_shipmode=[$10],
> l_suppkey=[$11], l_shipdate=[$12], l_commitdate=[$13], l_partkey=[$14],
> l_orderkey=[$15], l_quantity=[$16], l_linestatus=[$17], l_comment=[$18],
> l_extendedprice=[$19], l_linenumber=[$20], l_discount=[$21],
> l_shipinstruct=[$22], o_orderstatus=[$23], o_clerk=[$24], o_orderdate=[$25],
> o_shippriority=[$26], o_custkey=[$27], o_totalprice=[$28], o_orderkey=[$29],
> o_comment=[$30], o_orderpriority=[$31], n_comment=[$32], n_nationkey=[$33],
> n_regionkey=[$34], n_name=[$35], $f0=[CAST($38):BOOLEAN])
> EnumerableHashJoin(condition=[AND(=($11, $36), =($15, $37))],
> joinType=[inner])
> EnumerableProject(s_comment=[$0], s_phone=[$1],
> s_nationkey=[$2], s_name=[$3], s_address=[$4], s_acctbal=[$5],
> s_suppkey=[$6], l_returnflag=[$7], l_receiptdate=[$8], l_tax=[$9],
> l_shipmode=[$10], l_suppkey=[$11], l_shipdate=[$12], l_commitdate=[$13],
> l_partkey=[$14], l_orderkey=[$15], l_quantity=[$16], l_linestatus=[$17],
> l_comment=[$18], l_extendedprice=[$19], l_linenumber=[$20], l_discount=[$21],
> l_shipinstruct=[$22], o_orderstatus=[$27], o_clerk=[$28], o_orderdate=[$29],
> o_shippriority=[$30], o_custkey=[$31], o_totalprice=[$32], o_orderkey=[$33],
> o_comment=[$34], o_orderpriority=[$35], n_comment=[$23], n_nationkey=[$24],
> n_regionkey=[$25], n_name=[$26])
> EnumerableHashJoin(condition=[=($15, $33)],
> joinType=[inner])
> EnumerableProject(s_comment=[$0], s_phone=[$1],
> s_nationkey=[$2], s_name=[$3], s_address=[$4], s_acctbal=[$5],
> s_suppkey=[$6], l_returnflag=[$11], l_receiptdate=[$12], l_tax=[$13],
> l_shipmode=[$14], l_suppkey=[$15], l_shipdate=[$16], l_commitdate=[$17],
> l_partkey=[$18], l_orderkey=[$19], l_quantity=[$20], l_linestatus=[$21],
> l_comment=[$22], l_extendedprice=[$23], l_linenumber=[$24], l_discount=[$25],
> l_shipinstruct=[$26], n_comment=[$7], n_nationkey=[$8], n_regionkey=[$9],
> n_name=[$10])
> EnumerableHashJoin(condition=[=($6, $15)],
> joinType=[inner])
> EnumerableHashJoin(condition=[=($2, $8)],
> joinType=[inner])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> supplier]])
> EnumerableFilter(condition=[=($3, 'BRAZIL')])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> nation]])
> EnumerableFilter(condition=[>($1, $6)])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> lineitem]])
> EnumerableFilter(condition=[=($0, 'F')])
> EnumerableTableScan(table=[[tpch-sf0.01/csv, orders]])
> EnumerableAggregate(group=[\\{0, 1}], agg#0=[MIN($2)])
> EnumerableProject(l_suppkey0=[$16], l_orderkey0=[$17],
> $f0=[true])
> EnumerableFilter(condition=[<>($4, $16)])
> EnumerableHashJoin(condition=[=($8, $17)],
> joinType=[inner])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> lineitem]])
> EnumerableAggregate(group=[\\{0, 1}])
> EnumerableProject(l_suppkey=[$11], l_orderkey=[$15])
> EnumerableNestedLoopJoin(condition=[true],
> joinType=[inner])
> EnumerableNestedLoopJoin(condition=[true],
> joinType=[inner])
> EnumerableNestedLoopJoin(condition=[true],
> joinType=[inner])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, supplier]])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, lineitem]])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> orders]])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> nation]])
> EnumerableAggregate(group=[\\{0, 1}], agg#0=[MIN($2)])
> EnumerableProject(l_suppkey0=[$16], l_orderkey0=[$17],
> $f0=[true])
> EnumerableFilter(condition=[<>($4, $16)])
> EnumerableHashJoin(condition=[=($8, $17)], joinType=[inner])
> EnumerableFilter(condition=[>($1, $6)])
> EnumerableTableScan(table=[[tpch-sf0.01/csv, lineitem]])
> EnumerableAggregate(group=[\\{0, 1}])
> EnumerableProject(l_suppkey=[$11], l_orderkey=[$15])
> EnumerableHashJoin(condition=[AND(=($11, $36), =($15,
> $37))], joinType=[semi])
> EnumerableProject(s_comment=[$0], s_phone=[$1],
> s_nationkey=[$2], s_name=[$3], s_address=[$4], s_acctbal=[$5],
> s_suppkey=[$6], l_returnflag=[$7], l_receiptdate=[$8], l_tax=[$9],
> l_shipmode=[$10], l_suppkey=[$11], l_shipdate=[$12], l_commitdate=[$13],
> l_partkey=[$14], l_orderkey=[$15], l_quantity=[$16], l_linestatus=[$17],
> l_comment=[$18], l_extendedprice=[$19], l_linenumber=[$20], l_discount=[$21],
> l_shipinstruct=[$22], o_orderstatus=[$27], o_clerk=[$28], o_orderdate=[$29],
> o_shippriority=[$30], o_custkey=[$31], o_totalprice=[$32], o_orderkey=[$33],
> o_comment=[$34], o_orderpriority=[$35], n_comment=[$23], n_nationkey=[$24],
> n_regionkey=[$25], n_name=[$26])
> EnumerableHashJoin(condition=[=($15, $33)],
> joinType=[inner])
> EnumerableProject(s_comment=[$0], s_phone=[$1],
> s_nationkey=[$2], s_name=[$3], s_address=[$4], s_acctbal=[$5],
> s_suppkey=[$6], l_returnflag=[$11], l_receiptdate=[$12], l_tax=[$13],
> l_shipmode=[$14], l_suppkey=[$15], l_shipdate=[$16], l_commitdate=[$17],
> l_partkey=[$18], l_orderkey=[$19], l_quantity=[$20], l_linestatus=[$21],
> l_comment=[$22], l_extendedprice=[$23], l_linenumber=[$24], l_discount=[$25],
> l_shipinstruct=[$26], n_comment=[$7], n_nationkey=[$8], n_regionkey=[$9],
> n_name=[$10])
> EnumerableHashJoin(condition=[=($6, $15)],
> joinType=[inner])
> EnumerableHashJoin(condition=[=($2, $8)],
> joinType=[inner])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, supplier]])
> EnumerableFilter(condition=[=($3,
> 'BRAZIL')])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, nation]])
> EnumerableFilter(condition=[>($1, $6)])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, lineitem]])
> EnumerableFilter(condition=[=($0, 'F')])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> orders]])
> EnumerableProject(l_suppkey0=[$16],
> l_orderkey0=[$17], $f0=[true])
> EnumerableFilter(condition=[<>($4, $16)])
> EnumerableHashJoin(condition=[=($8, $17)],
> joinType=[inner])
> EnumerableTableScan(table=[[tpch-sf0.01/csv,
> lineitem]])
> EnumerableAggregate(group=[\\{0, 1}])
> EnumerableProject(l_suppkey=[$11],
> l_orderkey=[$15])
>
> EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
>
> EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
>
> EnumerableNestedLoopJoin(condition=[true], joinType=[inner])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, supplier]])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, lineitem]])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, orders]])
>
> EnumerableTableScan(table=[[tpch-sf0.01/csv, nation]])
> {quote}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)