Hi I met the same problem and you can take a look at this JIRA. I also pull a request, but it does not get merged: https://github.com/apache/calcite/pull/2253 This optimization I think is very common and useful, I'll continue this work soon.
cc Julian Regards! Aron Tao Priyendra Deshwal <[email protected]> 于2021年3月1日周一 下午4:52写道: > That rule was already enabled for me. I also tried enabling a collection of > other *_REDUCE_EXPRESSION rules but it did not help. Reading the underlying > code, it suggests that these rules help with constant elimination but > perhaps do not do the kind of common expression factoring that may be > needed for this particular case. > > > On Sun, Feb 28, 2021 at 11:32 PM Fan Liya <[email protected]> wrote: > > > Hi Priyendra, > > > > We have FilterReduceExpressionsRule which reduces filter conditions. > > It performs the simplification based on > org.apache.calcite.rex.RexSimplify. > > > > Best, > > Liya Fan > > > > On Mon, Mar 1, 2021 at 1:45 PM Priyendra Deshwal <[email protected]> > > wrote: > > > > > Hello friends, > > > > > > I am playing around with TPC-DS schema and playing with the following > > > simplified variant of query41. > > > > > > select distinct(i_product_name) > > > from item i1 > > > where i_manufact_id between 738 and 738+40 > > > and (select count(*) as item_cnt > > > from item > > > where (i_manufact = i1.i_manufact and i_category = 'Women') or > > > (i_manufact = i1.i_manufact and i_category = 'Men')) > > > 0 > > > order by i_product_name > > > limit 100 > > > > > > This results in the following optimized plan. Note that the join > > condition > > > (i_manufact = i1.i_manufact) is not clearly expressed in this query > since > > > it is repeated in both OR clauses of the inner query. This results in a > > > nested loop join and even the filter on i_category is not pushed all > the > > > way down to the query. > > > > > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = > > > 2.929266516286664E9, id = 408 > > > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 5.4675E7, > cumulative > > > cost = 2.929266416286664E9, id = 406 > > > EnumerableAggregate(group=[{2}]): rowcount = 5.4675E7, cumulative > > cost > > > = 2.874591416286664E9, id = 404 > > > EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)], > > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5, > > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)], > > proj#0..4=[{exprs}], > > > $condition=[$t11]): rowcount = 5.4675E8, cumulative cost = > > > 2.819916416286664E9, id = 410 > > > EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]): > > > rowcount = 2.187E9, cumulative cost = 2.273166416286664E9, id = 400 > > > EnumerableInterpreter: rowcount = 18000.0, cumulative cost = > > > 9000.0, id = 386 > > > BindableTableScan(table=[[default, ITEM]], > > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14, > > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 360 > > > EnumerableAggregate(group=[{2}], ITEM_CNT=[COUNT()]): > rowcount > > = > > > 810000.0, cumulative cost = 8.193105E7, id = 398 > > > EnumerableNestedLoopJoin(condition=[OR(AND(=($1, $2), =($0, > > > 'Women')), AND(=($1, $2), =($0, 'Men')))], joinType=[inner]): rowcount > = > > > 8100000.0, cumulative cost = 8.10198E7, id = 396 > > > EnumerableInterpreter: rowcount = 18000.0, cumulative > cost > > = > > > 9000.0, id = 389 > > > BindableTableScan(table=[[default, ITEM]], > projects=[[12, > > > 14]]): rowcount = 18000.0, cumulative cost = 29.999999999999996, id = > 231 > > > EnumerableAggregate(group=[{0}]): rowcount = 1800.0, > > > cumulative cost = 10800.0, id = 394 > > > EnumerableInterpreter: rowcount = 18000.0, cumulative > > cost > > > = 9000.0, id = 392 > > > BindableTableScan(table=[[default, ITEM]], > > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[14]]): > > > rowcount = 18000.0, cumulative cost = 11.25, id = 364 > > > > > > A simple rewrite of the query as follows where we "factor" out the join > > > condition to the top level AND does make the plan significantly better > > > (hash join with complete filter push down to the source). > > > > > > select distinct(i_product_name) > > > from item i1 > > > where i_manufact_id between 738 and 738+40 > > > and (select count(*) as item_cnt > > > from item > > > where i_manufact = i1.i_manufact and > > > (i_category = 'Women' or i_category = 'Men')) > 0 > > > order by i_product_name > > > limit 100 > > > > > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = > > > 6523491.28666381, id = 336 > > > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 121500.0, > cumulative > > > cost = 6523391.28666381, id = 334 > > > EnumerableAggregate(group=[{2}]): rowcount = 121500.0, cumulative > > cost > > > = 6401891.28666381, id = 332 > > > EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)], > > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5, > > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)], > > proj#0..4=[{exprs}], > > > $condition=[$t11]): rowcount = 1215000.0, cumulative cost = > > > 6280391.28666381, id = 338 > > > EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]): > > > rowcount = 4860000.0, cumulative cost = 5065391.28666381, id = 328 > > > EnumerableInterpreter: rowcount = 18000.0, cumulative cost = > > > 9000.0, id = 321 > > > BindableTableScan(table=[[default, ITEM]], > > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14, > > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 298 > > > EnumerableAggregate(group=[{0}], ITEM_CNT=[COUNT()]): > rowcount > > = > > > 1800.0, cumulative cost = 11025.0, id = 326 > > > EnumerableInterpreter: rowcount = 18000.0, cumulative cost > = > > > 9000.0, id = 324 > > > BindableTableScan(table=[[default, ITEM]], > > > filters=[[OR(=($12, 'Women'), =($12, 'Men'))]], projects=[[14]]): > > rowcount > > > = 18000.0, cumulative cost = 11.25, id = 302 > > > > > > Is there any built-in Calcite rule that I can invoke which can do this > > type > > > of factorization of common where sub-conditions? > > > > > > Thanks! > > > > > >
