That rule was already enabled for me. I also tried enabling a collection of
other *_REDUCE_EXPRESSION rules but it did not help. Reading the underlying
code, it suggests that these rules help with constant elimination but
perhaps do not do the kind of common expression factoring that may be
needed for this particular case.


On Sun, Feb 28, 2021 at 11:32 PM Fan Liya <[email protected]> wrote:

> Hi Priyendra,
>
> We have FilterReduceExpressionsRule which reduces filter conditions.
> It performs the simplification based on org.apache.calcite.rex.RexSimplify.
>
> Best,
> Liya Fan
>
> On Mon, Mar 1, 2021 at 1:45 PM Priyendra Deshwal <[email protected]>
> wrote:
>
> > Hello friends,
> >
> > I am playing around with TPC-DS schema and playing with the following
> > simplified variant of query41.
> >
> > select  distinct(i_product_name)
> >  from item i1
> >  where i_manufact_id between 738 and 738+40
> >    and (select count(*) as item_cnt
> >         from item
> >         where (i_manufact = i1.i_manufact and i_category = 'Women') or
> >                    (i_manufact = i1.i_manufact and i_category = 'Men'))
> > 0
> >  order by i_product_name
> >  limit 100
> >
> > This results in the following optimized plan. Note that the join
> condition
> > (i_manufact = i1.i_manufact) is not clearly expressed in this query since
> > it is repeated in both OR clauses of the inner query. This results in a
> > nested loop join and even the filter on i_category is not pushed all the
> > way down to the query.
> >
> > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost =
> > 2.929266516286664E9, id = 408
> >   EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 5.4675E7, cumulative
> > cost = 2.929266416286664E9, id = 406
> >     EnumerableAggregate(group=[{2}]): rowcount = 5.4675E7, cumulative
> cost
> > = 2.874591416286664E9, id = 404
> >       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)],
> > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5,
> > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)],
> proj#0..4=[{exprs}],
> > $condition=[$t11]): rowcount = 5.4675E8, cumulative cost =
> > 2.819916416286664E9, id = 410
> >         EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]):
> > rowcount = 2.187E9, cumulative cost = 2.273166416286664E9, id = 400
> >           EnumerableInterpreter: rowcount = 18000.0, cumulative cost =
> > 9000.0, id = 386
> >             BindableTableScan(table=[[default, ITEM]],
> > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14,
> > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 360
> >           EnumerableAggregate(group=[{2}], ITEM_CNT=[COUNT()]): rowcount
> =
> > 810000.0, cumulative cost = 8.193105E7, id = 398
> >             EnumerableNestedLoopJoin(condition=[OR(AND(=($1, $2), =($0,
> > 'Women')), AND(=($1, $2), =($0, 'Men')))], joinType=[inner]): rowcount =
> > 8100000.0, cumulative cost = 8.10198E7, id = 396
> >               EnumerableInterpreter: rowcount = 18000.0, cumulative cost
> =
> > 9000.0, id = 389
> >                 BindableTableScan(table=[[default, ITEM]], projects=[[12,
> > 14]]): rowcount = 18000.0, cumulative cost = 29.999999999999996, id = 231
> >               EnumerableAggregate(group=[{0}]): rowcount = 1800.0,
> > cumulative cost = 10800.0, id = 394
> >                 EnumerableInterpreter: rowcount = 18000.0, cumulative
> cost
> > = 9000.0, id = 392
> >                   BindableTableScan(table=[[default, ITEM]],
> > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[14]]):
> > rowcount = 18000.0, cumulative cost = 11.25, id = 364
> >
> > A simple rewrite of the query as follows where we "factor" out the join
> > condition to the top level AND does make the plan significantly better
> > (hash join with complete filter push down to the source).
> >
> > select  distinct(i_product_name)
> >  from item i1
> >  where i_manufact_id between 738 and 738+40
> >    and (select count(*) as item_cnt
> >         from item
> >         where i_manufact = i1.i_manufact and
> >                    (i_category = 'Women' or i_category = 'Men')) > 0
> >  order by i_product_name
> >  limit 100
> >
> > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost =
> > 6523491.28666381, id = 336
> >   EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 121500.0, cumulative
> > cost = 6523391.28666381, id = 334
> >     EnumerableAggregate(group=[{2}]): rowcount = 121500.0, cumulative
> cost
> > = 6401891.28666381, id = 332
> >       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)],
> > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5,
> > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)],
> proj#0..4=[{exprs}],
> > $condition=[$t11]): rowcount = 1215000.0, cumulative cost =
> > 6280391.28666381, id = 338
> >         EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]):
> > rowcount = 4860000.0, cumulative cost = 5065391.28666381, id = 328
> >           EnumerableInterpreter: rowcount = 18000.0, cumulative cost =
> > 9000.0, id = 321
> >             BindableTableScan(table=[[default, ITEM]],
> > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14,
> > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 298
> >           EnumerableAggregate(group=[{0}], ITEM_CNT=[COUNT()]): rowcount
> =
> > 1800.0, cumulative cost = 11025.0, id = 326
> >             EnumerableInterpreter: rowcount = 18000.0, cumulative cost =
> > 9000.0, id = 324
> >               BindableTableScan(table=[[default, ITEM]],
> > filters=[[OR(=($12, 'Women'), =($12, 'Men'))]], projects=[[14]]):
> rowcount
> > = 18000.0, cumulative cost = 11.25, id = 302
> >
> > Is there any built-in Calcite rule that I can invoke which can do this
> type
> > of factorization of common where sub-conditions?
> >
> > Thanks!
> >
>

Reply via email to