[ 
https://issues.apache.org/jira/browse/CALCITE-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323516#comment-16323516
 ] 

Julian Hyde commented on CALCITE-468:
-------------------------------------

See also CALCITE-1317, which is a form of WinMagic.

> Introduce semi join reduction optimization in Calcite 
> ------------------------------------------------------
>
>                 Key: CALCITE-468
>                 URL: https://issues.apache.org/jira/browse/CALCITE-468
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Mostafa Mokhtar
>            Assignee: Julian Hyde
>              Labels: hive
>         Attachments: BaselineTree.png, SemiJoinReductionGains.png, 
> SemiJoinReductionTreee.png
>
>
> The basic idea is to apply join predicates early in a plan in order to reduce 
> the size of intermediate query results and, thus, reduce the cost of other 
> operations. In other words, the idea is to apply the same join predicates 
> twice or more often in a query plan
> In order to reduce the communication costs of a distributed system. 
> Obviously, semi-join reducers are only effective if the (redundant) 
> semi-joins are cheap and result in a significant reduction of the size of 
> intermediate
> query results.
> I propose to extend a query optimizer and integrate semi-join reducer and
> join-ordering, etc. into a single query optimization step
> Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi 
> join reduction optimization in Calcite.
> Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.
> {code}
> Query                 Total time           CPU        Intermediate rows 
> (Million)
> Baseline                      1,377    356,900                            
> 23,940
> Semi Join Reduction           343       47,253                                
> 23
> {code}
> Q64 subset 
> {code}
> select 
>     count(*)
> FROM
>     store_sales
>         JOIN
>     item ON store_sales.ss_item_sk = item.i_item_sk
>         JOIN
>     store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk
>         JOIN
>     (select 
>         cs_item_sk
>     from
>         catalog_sales
>     JOIN catalog_returns ON catalog_sales.cs_item_sk = 
> catalog_returns.cr_item_sk
>     group by cs_item_sk
>     having 
> sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + 
> cr_store_credit)) cs_ui 
> ON store_sales.ss_item_sk = cs_ui.cs_item_sk
> WHERE
>     i_color in ('maroon' , 'burnished',
>         'dim',
>         'steel',
>         'navajo',
>         'chocolate')
>         and i_current_price between 35 and 35 + 10
>         and i_current_price between 35 + 1 and 35 + 15
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to