The JIRA is: https://issues.apache.org/jira/browse/CALCITE-4375


Regards!

Aron Tao


JiaTao Tao <[email protected]> 于2021年3月2日周二 上午11:39写道:

> Hi
>
> I met the same problem and you can take a look at this JIRA.
> I also pull a request, but it does not get merged:
> https://github.com/apache/calcite/pull/2253
> This optimization I think is very common and useful, I'll continue this
> work soon.
>
> cc Julian
>
> Regards!
>
> Aron Tao
>
>
> Priyendra Deshwal <[email protected]> 于2021年3月1日周一 下午4:52写道:
>
>> That rule was already enabled for me. I also tried enabling a collection
>> of
>> other *_REDUCE_EXPRESSION rules but it did not help. Reading the
>> underlying
>> code, it suggests that these rules help with constant elimination but
>> perhaps do not do the kind of common expression factoring that may be
>> needed for this particular case.
>>
>>
>> On Sun, Feb 28, 2021 at 11:32 PM Fan Liya <[email protected]> wrote:
>>
>> > Hi Priyendra,
>> >
>> > We have FilterReduceExpressionsRule which reduces filter conditions.
>> > It performs the simplification based on
>> org.apache.calcite.rex.RexSimplify.
>> >
>> > Best,
>> > Liya Fan
>> >
>> > On Mon, Mar 1, 2021 at 1:45 PM Priyendra Deshwal <[email protected]>
>> > wrote:
>> >
>> > > Hello friends,
>> > >
>> > > I am playing around with TPC-DS schema and playing with the following
>> > > simplified variant of query41.
>> > >
>> > > select  distinct(i_product_name)
>> > >  from item i1
>> > >  where i_manufact_id between 738 and 738+40
>> > >    and (select count(*) as item_cnt
>> > >         from item
>> > >         where (i_manufact = i1.i_manufact and i_category = 'Women') or
>> > >                    (i_manufact = i1.i_manufact and i_category =
>> 'Men'))
>> > > 0
>> > >  order by i_product_name
>> > >  limit 100
>> > >
>> > > This results in the following optimized plan. Note that the join
>> > condition
>> > > (i_manufact = i1.i_manufact) is not clearly expressed in this query
>> since
>> > > it is repeated in both OR clauses of the inner query. This results in
>> a
>> > > nested loop join and even the filter on i_category is not pushed all
>> the
>> > > way down to the query.
>> > >
>> > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost =
>> > > 2.929266516286664E9, id = 408
>> > >   EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 5.4675E7,
>> cumulative
>> > > cost = 2.929266416286664E9, id = 406
>> > >     EnumerableAggregate(group=[{2}]): rowcount = 5.4675E7, cumulative
>> > cost
>> > > = 2.874591416286664E9, id = 404
>> > >       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)],
>> > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5,
>> > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)],
>> > proj#0..4=[{exprs}],
>> > > $condition=[$t11]): rowcount = 5.4675E8, cumulative cost =
>> > > 2.819916416286664E9, id = 410
>> > >         EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]):
>> > > rowcount = 2.187E9, cumulative cost = 2.273166416286664E9, id = 400
>> > >           EnumerableInterpreter: rowcount = 18000.0, cumulative cost =
>> > > 9000.0, id = 386
>> > >             BindableTableScan(table=[[default, ITEM]],
>> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14,
>> > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 360
>> > >           EnumerableAggregate(group=[{2}], ITEM_CNT=[COUNT()]):
>> rowcount
>> > =
>> > > 810000.0, cumulative cost = 8.193105E7, id = 398
>> > >             EnumerableNestedLoopJoin(condition=[OR(AND(=($1, $2),
>> =($0,
>> > > 'Women')), AND(=($1, $2), =($0, 'Men')))], joinType=[inner]):
>> rowcount =
>> > > 8100000.0, cumulative cost = 8.10198E7, id = 396
>> > >               EnumerableInterpreter: rowcount = 18000.0, cumulative
>> cost
>> > =
>> > > 9000.0, id = 389
>> > >                 BindableTableScan(table=[[default, ITEM]],
>> projects=[[12,
>> > > 14]]): rowcount = 18000.0, cumulative cost = 29.999999999999996, id =
>> 231
>> > >               EnumerableAggregate(group=[{0}]): rowcount = 1800.0,
>> > > cumulative cost = 10800.0, id = 394
>> > >                 EnumerableInterpreter: rowcount = 18000.0, cumulative
>> > cost
>> > > = 9000.0, id = 392
>> > >                   BindableTableScan(table=[[default, ITEM]],
>> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[14]]):
>> > > rowcount = 18000.0, cumulative cost = 11.25, id = 364
>> > >
>> > > A simple rewrite of the query as follows where we "factor" out the
>> join
>> > > condition to the top level AND does make the plan significantly better
>> > > (hash join with complete filter push down to the source).
>> > >
>> > > select  distinct(i_product_name)
>> > >  from item i1
>> > >  where i_manufact_id between 738 and 738+40
>> > >    and (select count(*) as item_cnt
>> > >         from item
>> > >         where i_manufact = i1.i_manufact and
>> > >                    (i_category = 'Women' or i_category = 'Men')) > 0
>> > >  order by i_product_name
>> > >  limit 100
>> > >
>> > > EnumerableLimit(fetch=[100]): rowcount = 100.0, cumulative cost =
>> > > 6523491.28666381, id = 336
>> > >   EnumerableSort(sort0=[$0], dir0=[ASC]): rowcount = 121500.0,
>> cumulative
>> > > cost = 6523391.28666381, id = 334
>> > >     EnumerableAggregate(group=[{2}]): rowcount = 121500.0, cumulative
>> > cost
>> > > = 6401891.28666381, id = 332
>> > >       EnumerableCalc(expr#0..4=[{inputs}], expr#5=[IS NULL($t4)],
>> > > expr#6=[0:BIGINT], expr#7=[0], expr#8=[>($t6, $t7)], expr#9=[AND($t5,
>> > > $t8)], expr#10=[>($t4, $t7)], expr#11=[OR($t9, $t10)],
>> > proj#0..4=[{exprs}],
>> > > $condition=[$t11]): rowcount = 1215000.0, cumulative cost =
>> > > 6280391.28666381, id = 338
>> > >         EnumerableHashJoin(condition=[=($1, $3)], joinType=[left]):
>> > > rowcount = 4860000.0, cumulative cost = 5065391.28666381, id = 328
>> > >           EnumerableInterpreter: rowcount = 18000.0, cumulative cost =
>> > > 9000.0, id = 321
>> > >             BindableTableScan(table=[[default, ITEM]],
>> > > filters=[[AND(>=($13, 738), <=($13, +(738, 40)))]], projects=[[13, 14,
>> > > 21]]): rowcount = 18000.0, cumulative cost = 18.75, id = 298
>> > >           EnumerableAggregate(group=[{0}], ITEM_CNT=[COUNT()]):
>> rowcount
>> > =
>> > > 1800.0, cumulative cost = 11025.0, id = 326
>> > >             EnumerableInterpreter: rowcount = 18000.0, cumulative
>> cost =
>> > > 9000.0, id = 324
>> > >               BindableTableScan(table=[[default, ITEM]],
>> > > filters=[[OR(=($12, 'Women'), =($12, 'Men'))]], projects=[[14]]):
>> > rowcount
>> > > = 18000.0, cumulative cost = 11.25, id = 302
>> > >
>> > > Is there any built-in Calcite rule that I can invoke which can do this
>> > type
>> > > of factorization of common where sub-conditions?
>> > >
>> > > Thanks!
>> > >
>> >
>>
>

Reply via email to