[ 
https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17933287#comment-17933287
 ] 

Alessandro Solimando commented on CALCITE-6846:
-----------------------------------------------

[~rubenql] thanks for driving the review, I have followed the messages and I 
agree on what was discussed, even if I didn't have time to do a review myself, 
especially for what concerns the algorithm itself.

I have two general comments:
 - it would be good to have compare using a join-specific benchmark like the 
[JOB|https://github.com/gregrahn/join-order-benchmark] but I wouldn't consider 
this a blocker as the algorithm itself comes from a very reputable source and 
well-known authors, if all is marked as experimental I am OK to have this in as 
it's proposed now
 - the algorithm implementation has been ported from Apache Doris IIRC, I think 
we should acknowledge that explicitly in the code as the paper itself doesn't 
come with an actual implementation

The acknowledgement of where the code is coming from is the only missing bit 
IMO at this stage.

> Support basic dphyp join reorder algorithm
> ------------------------------------------
>
>                 Key: CALCITE-6846
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6846
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.38.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.39.0
>
>
> Supports the basic dphyp join reorder algorithm.
> For example :
> {code:java}
> SELECT
>     i_item_id
> FROM store_sales, customer_demographics, date_dim, item, promotion
> WHERE ss_sold_date_sk = d_date_sk AND
>     ss_item_sk = i_item_sk AND
>     ss_cdemo_sk = cd_demo_sk AND
>     ss_promo_sk = p_promo_sk {code}
> The plan tree after pushing down filter :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   LogicalJoin(condition=[=($7, $82)], joinType=[inner])
>     LogicalJoin(condition=[=($1, $60)], joinType=[inner])
>       LogicalJoin(condition=[=($22, $32)], joinType=[inner])
>         LogicalJoin(condition=[=($3, $23)], joinType=[inner])
>           LogicalTableScan(table=[[tpcds, store_sales]])
>           LogicalTableScan(table=[[tpcds, customer_demographics]])
>         LogicalTableScan(table=[[tpcds, date_dim]])
>       LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]){code}
> Convert Joins into one HyperGraph :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   
> HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
>     LogicalTableScan(table=[[tpcds, store_sales]])
>     LogicalTableScan(table=[[tpcds, customer_demographics]])
>     LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]) {code}
> After dphyp join reorder (with trimming fields and pushing down Project), the 
> plan is :
> {code:java}
> LogicalProject(i_item_id=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[inner])
>     LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
>       LogicalJoin(condition=[=($1, $3)], joinType=[inner])
>         LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
>           LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>             LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1], 
> ss_sold_date_sk=[$3])
>               LogicalJoin(condition=[=($2, $4)], joinType=[inner])
>                 LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], 
> ss_promo_sk=[$7], ss_sold_date_sk=[$22])
>                   LogicalTableScan(table=[[tpcds, store_sales]])
>                 LogicalProject(p_promo_sk=[$0])
>                   LogicalTableScan(table=[[tpcds, promotion]])
>             LogicalProject(i_item_sk=[$0], i_item_id=[$1])
>               LogicalTableScan(table=[[tpcds, item]])
>         LogicalProject(d_date_sk=[$0])
>           LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalProject(cd_demo_sk=[$0])
>       LogicalTableScan(table=[[tpcds, customer_demographics]]) {code}
> The main enumeration process of dphyp will be implemented in pr. However, it 
> only can process inner join for now and the simplification of hypergraph has 
> not yet been implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to