[
https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17932492#comment-17932492
]
Silun Dong commented on CALCITE-6846:
-------------------------------------
For the three test (testChainJoinDphypJoinReorder,
testStarJoinDphypJoinReorder, testCycleJoinDphypJoinReorder) in this PR, I have
respectively tried to optimize with dphyp (CoreRules.JOIN_TO_HYPER_GRAPH +
CoreRule.HYPER_GRAPH_OPTIMIZE) and LoptOptimizeJoinRule
(CoreRules.JOIN_TO_MULTI_JOIN + CoreRules.MULTI_JOIN_OPTIMIZE), and check the
cost of relAfter on line 397 of RelOptFixture (I don't know of a more direct
way to show this, so I just printed it out and verify it myself.). The
comparison is as follows:
|| ||dphyp||LoptOptimizeJoinRule||
|chain|144.31199999999998|152.11200000000002|
|star|98.40728|184.33928|
|cycle|45.44|45.44|
DPhyp enumerates all possibilities of join graph without duplication (currently
PR can only handle inner join scenarios), so I think:
* When there are only inner join, dphyp must be better.
* When inner join mixed with other types of join, it's better to use the cost
to decide which algorithm is more efficient. If the code related dphyp is
improved in the future to handle all join types, then dphyp will be better.
* When there are too many Join tables, even if dphyp algorithm does not
enumerate repeatedly, it will be very time-consuming. However,
LoptOptimizeJoinRule can quickly get a relatively good result.
> Support basic dphyp join reorder algorithm
> ------------------------------------------
>
> Key: CALCITE-6846
> URL: https://issues.apache.org/jira/browse/CALCITE-6846
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.38.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.39.0
>
>
> Supports the basic dphyp join reorder algorithm.
> For example :
> {code:java}
> SELECT
> i_item_id
> FROM store_sales, customer_demographics, date_dim, item, promotion
> WHERE ss_sold_date_sk = d_date_sk AND
> ss_item_sk = i_item_sk AND
> ss_cdemo_sk = cd_demo_sk AND
> ss_promo_sk = p_promo_sk {code}
> The plan tree after pushing down filter :
> {code:java}
> LogicalProject(i_item_id=[$61])
> LogicalJoin(condition=[=($7, $82)], joinType=[inner])
> LogicalJoin(condition=[=($1, $60)], joinType=[inner])
> LogicalJoin(condition=[=($22, $32)], joinType=[inner])
> LogicalJoin(condition=[=($3, $23)], joinType=[inner])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalTableScan(table=[[tpcds, customer_demographics]])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalTableScan(table=[[tpcds, promotion]]){code}
> Convert Joins into one HyperGraph :
> {code:java}
> LogicalProject(i_item_id=[$61])
>
> HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalTableScan(table=[[tpcds, customer_demographics]])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalTableScan(table=[[tpcds, promotion]]) {code}
> After dphyp join reorder (with trimming fields and pushing down Project), the
> plan is :
> {code:java}
> LogicalProject(i_item_id=[$1])
> LogicalJoin(condition=[=($0, $2)], joinType=[inner])
> LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
> LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1],
> ss_sold_date_sk=[$3])
> LogicalJoin(condition=[=($2, $4)], joinType=[inner])
> LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3],
> ss_promo_sk=[$7], ss_sold_date_sk=[$22])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalProject(p_promo_sk=[$0])
> LogicalTableScan(table=[[tpcds, promotion]])
> LogicalProject(i_item_sk=[$0], i_item_id=[$1])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalProject(d_date_sk=[$0])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalProject(cd_demo_sk=[$0])
> LogicalTableScan(table=[[tpcds, customer_demographics]]) {code}
> The main enumeration process of dphyp will be implemented in pr. However, it
> only can process inner join for now and the simplification of hypergraph has
> not yet been implemented.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)