[
https://issues.apache.org/jira/browse/CALCITE-468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde reassigned CALCITE-468:
-----------------------------------
Assignee: Julian Hyde (was: Laljo John Pullokkaran)
> Introduce semi join reduction optimization in Calcite
> ------------------------------------------------------
>
> Key: CALCITE-468
> URL: https://issues.apache.org/jira/browse/CALCITE-468
> Project: Calcite
> Issue Type: Bug
> Reporter: Mostafa Mokhtar
> Assignee: Julian Hyde
> Labels: hive
> Attachments: BaselineTree.png, SemiJoinReductionGains.png,
> SemiJoinReductionTreee.png
>
>
> The basic idea is to apply join predicates early in a plan in order to reduce
> the size of intermediate query results and, thus, reduce the cost of other
> operations. In other words, the idea is to apply the same join predicates
> twice or more often in a query plan
> In order to reduce the communication costs of a distributed system.
> Obviously, semi-join reducers are only effective if the (redundant)
> semi-joins are cheap and result in a significant reduction of the size of
> intermediate
> query results.
> I propose to extend a query optimizer and integrate semi-join reducer and
> join-ordering, etc. into a single query optimization step
> Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi
> join reduction optimization in Calcite.
> Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains.
> {code}
> Query Total time CPU Intermediate rows
> (Million)
> Baseline 1,377 356,900
> 23,940
> Semi Join Reduction 343 47,253
> 23
> {code}
> Q64 subset
> {code}
> select
> count(*)
> FROM
> store_sales
> JOIN
> item ON store_sales.ss_item_sk = item.i_item_sk
> JOIN
> store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk
> JOIN
> (select
> cs_item_sk
> from
> catalog_sales
> JOIN catalog_returns ON catalog_sales.cs_item_sk =
> catalog_returns.cr_item_sk
> group by cs_item_sk
> having
> sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge +
> cr_store_credit)) cs_ui
> ON store_sales.ss_item_sk = cs_ui.cs_item_sk
> WHERE
> i_color in ('maroon' , 'burnished',
> 'dim',
> 'steel',
> 'navajo',
> 'chocolate')
> and i_current_price between 35 and 35 + 10
> and i_current_price between 35 + 1 and 35 + 15
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)