The JIRA is: https://issues.apache.org/jira/browse/CALCITE-4375
Regards! Aron Tao JiaTao Tao <[email protected]> 于2021年3月2日周二 上午11:39写道: > Hi > > I met the same problem and you can take a look at this JIRA. > I also pull a request, but it does not get merged: > https://github.com/apache/calcite/pull/2253 > This optimization I think is very common and useful, I'll continue this > work soon. > > cc Julian > > Regards! > > Aron Tao > > > Priyendra Deshwal <[email protected]> 于2021年3月1日周一 下午4:52写道: > >> That rule was already enabled for me. I also tried enabling a collection >> of >> other *_REDUCE_EXPRESSION rules but it did not help. Reading the >> underlying >> code, it suggests that these rules help with constant elimination but >> perhaps do not do the kind of common expression factoring that may be >> needed for this particular case. >> >> >> On Sun, Feb 28, 2021 at 11:32 PM Fan Liya <[email protected]> wrote: >> >> > Hi Priyendra, >> > >> > We have FilterReduceExpressionsRule which reduces filter conditions. >> > It performs the simplification based on >> org.apache.calcite.rex.RexSimplify. >> > >> > Best, >> > Liya Fan >> > >> > On Mon, Mar 1, 2021 at 1:45 PM Priyendra Deshwal <[email protected]> >> > wrote: >> > >> > > Hello friends, >> > > >> > > I am playing around with TPC-DS schema and playing with the following >> > > simplified variant of query41. >> > > >> > > select distinct(i_product_name) >> > > from item i1 >> > > where i_manufact_id between 738 and 738+40 >> > > and (select count(*) as item_cnt >> > > from item >> > > where (i_manufact = i1.i_manufact and i_category = 'Women') or >> > > (i_manufact = i1.i_manufact and i_category = >> 'Men')) >> > > 0 >> > > order by i_product_name >> > > limit 100 >> > > >> > > This results in the following optimized plan. Note that the join >> > condition >> > > (i_manufact = i1.i_manufact) is not clearly expressed in this query >> since >> > > it is repeated in both OR clauses of the inner query. This results in >> a >> > > nested loop join and even the filter on i_category is not pushed all >> the >> > > way down to the query. >> > > >> > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = >> > > 2.929266516286664E9, id = 408 >> > > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 5.4675E7, >> cumulative >> > > cost = 2.929266416286664E9, id = 406 >> > > EnumerableAggregate(group=[{2}]): rowcount = 5.4675E7, cumulative >> > cost >> > > = 2.874591416286664E9, id = 404 >> > > EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)], >> > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5, >> > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)], >> > proj#0..4=[{exprs}], >> > > $condition=[$t11]): rowcount = 5.4675E8, cumulative cost = >> > > 2.819916416286664E9, id = 410 >> > > EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]): >> > > rowcount = 2.187E9, cumulative cost = 2.273166416286664E9, id = 400 >> > > EnumerableInterpreter: rowcount = 18000.0, cumulative cost = >> > > 9000.0, id = 386 >> > > BindableTableScan(table=[[default, ITEM]], >> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14, >> > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 360 >> > > EnumerableAggregate(group=[{2}], ITEM_CNT=[COUNT()]): >> rowcount >> > = >> > > 810000.0, cumulative cost = 8.193105E7, id = 398 >> > > EnumerableNestedLoopJoin(condition=[OR(AND(=($1, $2), >> =($0, >> > > 'Women')), AND(=($1, $2), =($0, 'Men')))], joinType=[inner]): >> rowcount = >> > > 8100000.0, cumulative cost = 8.10198E7, id = 396 >> > > EnumerableInterpreter: rowcount = 18000.0, cumulative >> cost >> > = >> > > 9000.0, id = 389 >> > > BindableTableScan(table=[[default, ITEM]], >> projects=[[12, >> > > 14]]): rowcount = 18000.0, cumulative cost = 29.999999999999996, id = >> 231 >> > > EnumerableAggregate(group=[{0}]): rowcount = 1800.0, >> > > cumulative cost = 10800.0, id = 394 >> > > EnumerableInterpreter: rowcount = 18000.0, cumulative >> > cost >> > > = 9000.0, id = 392 >> > > BindableTableScan(table=[[default, ITEM]], >> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[14]]): >> > > rowcount = 18000.0, cumulative cost = 11.25, id = 364 >> > > >> > > A simple rewrite of the query as follows where we "factor" out the >> join >> > > condition to the top level AND does make the plan significantly better >> > > (hash join with complete filter push down to the source). >> > > >> > > select distinct(i_product_name) >> > > from item i1 >> > > where i_manufact_id between 738 and 738+40 >> > > and (select count(*) as item_cnt >> > > from item >> > > where i_manufact = i1.i_manufact and >> > > (i_category = 'Women' or i_category = 'Men')) > 0 >> > > order by i_product_name >> > > limit 100 >> > > >> > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost = >> > > 6523491.28666381, id = 336 >> > > EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 121500.0, >> cumulative >> > > cost = 6523391.28666381, id = 334 >> > > EnumerableAggregate(group=[{2}]): rowcount = 121500.0, cumulative >> > cost >> > > = 6401891.28666381, id = 332 >> > > EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)], >> > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5, >> > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)], >> > proj#0..4=[{exprs}], >> > > $condition=[$t11]): rowcount = 1215000.0, cumulative cost = >> > > 6280391.28666381, id = 338 >> > > EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]): >> > > rowcount = 4860000.0, cumulative cost = 5065391.28666381, id = 328 >> > > EnumerableInterpreter: rowcount = 18000.0, cumulative cost = >> > > 9000.0, id = 321 >> > > BindableTableScan(table=[[default, ITEM]], >> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14, >> > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 298 >> > > EnumerableAggregate(group=[{0}], ITEM_CNT=[COUNT()]): >> rowcount >> > = >> > > 1800.0, cumulative cost = 11025.0, id = 326 >> > > EnumerableInterpreter: rowcount = 18000.0, cumulative >> cost = >> > > 9000.0, id = 324 >> > > BindableTableScan(table=[[default, ITEM]], >> > > filters=[[OR(=($12, 'Women'), =($12, 'Men'))]], projects=[[14]]): >> > rowcount >> > > = 18000.0, cumulative cost = 11.25, id = 302 >> > > >> > > Is there any built-in Calcite rule that I can invoke which can do this >> > type >> > > of factorization of common where sub-conditions? >> > > >> > > Thanks! >> > > >> > >> >
