[ https://issues.apache.org/jira/browse/CALCITE-468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16323516#comment-16323516 ]
Julian Hyde commented on CALCITE-468: ------------------------------------- See also CALCITE-1317, which is a form of WinMagic. > Introduce semi join reduction optimization in Calcite > ------------------------------------------------------ > > Key: CALCITE-468 > URL: https://issues.apache.org/jira/browse/CALCITE-468 > Project: Calcite > Issue Type: Bug > Reporter: Mostafa Mokhtar > Assignee: Julian Hyde > Labels: hive > Attachments: BaselineTree.png, SemiJoinReductionGains.png, > SemiJoinReductionTreee.png > > > The basic idea is to apply join predicates early in a plan in order to reduce > the size of intermediate query results and, thus, reduce the cost of other > operations. In other words, the idea is to apply the same join predicates > twice or more often in a query plan > In order to reduce the communication costs of a distributed system. > Obviously, semi-join reducers are only effective if the (redundant) > semi-joins are cheap and result in a significant reduction of the size of > intermediate > query results. > I propose to extend a query optimizer and integrate semi-join reducer and > join-ordering, etc. into a single query optimization step > Several TPC-DS queries like 24, 64 & 80 run very slow do to the lake of semi > join reduction optimization in Calcite. > Doing a rewrite of Q64 to simulate semi join reduction produced 4x gains. > {code} > Query Total time CPU Intermediate rows > (Million) > Baseline 1,377 356,900 > 23,940 > Semi Join Reduction 343 47,253 > 23 > {code} > Q64 subset > {code} > select > count(*) > FROM > store_sales > JOIN > item ON store_sales.ss_item_sk = item.i_item_sk > JOIN > store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk > JOIN > (select > cs_item_sk > from > catalog_sales > JOIN catalog_returns ON catalog_sales.cs_item_sk = > catalog_returns.cr_item_sk > group by cs_item_sk > having > sum(cs_ext_list_price) > 2 * sum(cr_refunded_cash + cr_reversed_charge + > cr_store_credit)) cs_ui > ON store_sales.ss_item_sk = cs_ui.cs_item_sk > WHERE > i_color in ('maroon' , 'burnished', > 'dim', > 'steel', > 'navajo', > 'chocolate') > and i_current_price between 35 and 35 + 10 > and i_current_price between 35 + 1 and 35 + 15 > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)