[ 
https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17928613#comment-17928613
 ] 

Silun Dong commented on CALCITE-6846:
-------------------------------------

The source of the dphyp algorithm is the paper: [SIGMOD 2008] Dynamic 
Programming Strikes Back
It is a join reorder algorithm based on dynamic programming. In theory, it can 
handle complex join predicates and various join type(outer, semi, etc.). In 
terms of code implementation, I referred to Doris and implement the main 
enumeration process of the dphyp algorithm under the Calcite framework. Two new 
rules were added, one for converting Join to HyperGraph, the other for matching 
HyperGraph and performing the dphyp algorithm. Current pr can handle complex 
predicates in inner join scenarios. In the future, it should be expanded to 
handle more join types.

> Support basic dphyp join reorder algorithm
> ------------------------------------------
>
>                 Key: CALCITE-6846
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6846
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.38.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>
> Supports the basic dphyp join reorder algorithm.
> For example :
> {code:java}
> SELECT
>     i_item_id
> FROM store_sales, customer_demographics, date_dim, item, promotion
> WHERE ss_sold_date_sk = d_date_sk AND
>     ss_item_sk = i_item_sk AND
>     ss_cdemo_sk = cd_demo_sk AND
>     ss_promo_sk = p_promo_sk {code}
> The plan tree after pushing down filter :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   LogicalJoin(condition=[=($7, $82)], joinType=[inner])
>     LogicalJoin(condition=[=($1, $60)], joinType=[inner])
>       LogicalJoin(condition=[=($22, $32)], joinType=[inner])
>         LogicalJoin(condition=[=($3, $23)], joinType=[inner])
>           LogicalTableScan(table=[[tpcds, store_sales]])
>           LogicalTableScan(table=[[tpcds, customer_demographics]])
>         LogicalTableScan(table=[[tpcds, date_dim]])
>       LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]){code}
> Convert Joins into one HyperGraph :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   
> HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
>     LogicalTableScan(table=[[tpcds, store_sales]])
>     LogicalTableScan(table=[[tpcds, customer_demographics]])
>     LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]) {code}
> After dphyp join reorder (with trimming fields and pushing down Project), the 
> plan is :
> {code:java}
> LogicalProject(i_item_id=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[inner])
>     LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
>       LogicalJoin(condition=[=($1, $3)], joinType=[inner])
>         LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
>           LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>             LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1], 
> ss_sold_date_sk=[$3])
>               LogicalJoin(condition=[=($2, $4)], joinType=[inner])
>                 LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], 
> ss_promo_sk=[$7], ss_sold_date_sk=[$22])
>                   LogicalTableScan(table=[[tpcds, store_sales]])
>                 LogicalProject(p_promo_sk=[$0])
>                   LogicalTableScan(table=[[tpcds, promotion]])
>             LogicalProject(i_item_sk=[$0], i_item_id=[$1])
>               LogicalTableScan(table=[[tpcds, item]])
>         LogicalProject(d_date_sk=[$0])
>           LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalProject(cd_demo_sk=[$0])
>       LogicalTableScan(table=[[tpcds, customer_demographics]]) {code}
> The main enumeration process of dphyp will be implemented in pr. However, it 
> only can process inner join for now and the simplification of hypergraph has 
> not yet been implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to