[jira] [Commented] (CALCITE-6846) Support basic dphyp join reorder algorithm

Silun Dong (Jira) Tue, 04 Mar 2025 18:37:09 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932492#comment-17932492
 ]


Silun Dong commented on CALCITE-6846:
-------------------------------------

For the three test (testChainJoinDphypJoinReorder, 
testStarJoinDphypJoinReorder, testCycleJoinDphypJoinReorder) in this PR, I have 
respectively tried to optimize with dphyp (CoreRules.JOIN_TO_HYPER_GRAPH + 
CoreRule.HYPER_GRAPH_OPTIMIZE) and LoptOptimizeJoinRule 
(CoreRules.JOIN_TO_MULTI_JOIN + CoreRules.MULTI_JOIN_OPTIMIZE), and check the 
cost of relAfter on line 397 of RelOptFixture (I don't know of a more direct 
way to show this, so I just printed it out and verify it myself.). The 
comparison is as follows:
|| ||dphyp||LoptOptimizeJoinRule||
|chain|144.31199999999998|152.11200000000002|
|star|98.40728|184.33928|
|cycle|45.44|45.44|

DPhyp enumerates all possibilities of join graph without duplication (currently 
PR can only handle inner join scenarios), so I think:
 * When there are only inner join, dphyp must be better.
 * When inner join mixed with other types of join, it's better to use the cost 
to decide which algorithm is more efficient. If the code related dphyp is 
improved in the future to handle all join types, then dphyp will be better.
 * When there are too many Join tables, even if dphyp algorithm does not 
enumerate repeatedly, it will be very time-consuming. However, 
LoptOptimizeJoinRule can quickly get a relatively good result.

> Support basic dphyp join reorder algorithm
> ------------------------------------------
>
>                 Key: CALCITE-6846
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6846
>             Project: Calcite
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 1.38.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.39.0
>
>
> Supports the basic dphyp join reorder algorithm.
> For example :
> {code:java}
> SELECT
>     i_item_id
> FROM store_sales, customer_demographics, date_dim, item, promotion
> WHERE ss_sold_date_sk = d_date_sk AND
>     ss_item_sk = i_item_sk AND
>     ss_cdemo_sk = cd_demo_sk AND
>     ss_promo_sk = p_promo_sk {code}
> The plan tree after pushing down filter :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   LogicalJoin(condition=[=($7, $82)], joinType=[inner])
>     LogicalJoin(condition=[=($1, $60)], joinType=[inner])
>       LogicalJoin(condition=[=($22, $32)], joinType=[inner])
>         LogicalJoin(condition=[=($3, $23)], joinType=[inner])
>           LogicalTableScan(table=[[tpcds, store_sales]])
>           LogicalTableScan(table=[[tpcds, customer_demographics]])
>         LogicalTableScan(table=[[tpcds, date_dim]])
>       LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]){code}
> Convert Joins into one HyperGraph :
> {code:java}
> LogicalProject(i_item_id=[$61])
>   
> HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
>     LogicalTableScan(table=[[tpcds, store_sales]])
>     LogicalTableScan(table=[[tpcds, customer_demographics]])
>     LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalTableScan(table=[[tpcds, item]])
>     LogicalTableScan(table=[[tpcds, promotion]]) {code}
> After dphyp join reorder (with trimming fields and pushing down Project), the 
> plan is :
> {code:java}
> LogicalProject(i_item_id=[$1])
>   LogicalJoin(condition=[=($0, $2)], joinType=[inner])
>     LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
>       LogicalJoin(condition=[=($1, $3)], joinType=[inner])
>         LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
>           LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>             LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1], 
> ss_sold_date_sk=[$3])
>               LogicalJoin(condition=[=($2, $4)], joinType=[inner])
>                 LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3], 
> ss_promo_sk=[$7], ss_sold_date_sk=[$22])
>                   LogicalTableScan(table=[[tpcds, store_sales]])
>                 LogicalProject(p_promo_sk=[$0])
>                   LogicalTableScan(table=[[tpcds, promotion]])
>             LogicalProject(i_item_sk=[$0], i_item_id=[$1])
>               LogicalTableScan(table=[[tpcds, item]])
>         LogicalProject(d_date_sk=[$0])
>           LogicalTableScan(table=[[tpcds, date_dim]])
>     LogicalProject(cd_demo_sk=[$0])
>       LogicalTableScan(table=[[tpcds, customer_demographics]]) {code}
> The main enumeration process of dphyp will be implemented in pr. However, it 
> only can process inner join for now and the simplification of hypergraph has 
> not yet been implemented.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-6846) Support basic dphyp join reorder algorithm

Reply via email to