[
https://issues.apache.org/jira/browse/CALCITE-6846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18005540#comment-18005540
]
Stamatis Zampetakis commented on CALCITE-6846:
----------------------------------------------
This is a closed and released feature so if there are problems, bugs, and
improvements to be made we should log a new ticket and follow-up there. Cost
estimation and join enumeration are two different things so it would be useful
to clarify where exactly is the problem.
The cost function should be a blackbox for the algorithm. This is usually the
case if the RelMetadata/RelMetadataQuery classes are used correctly. Any
application can extend and override the metadata providers to change and adapt
the costing based on its needs. If you want to account for parallelism you
should modify the cost model accordingly but this should not require changes to
the algorithm.
> Support basic DPhyp join reorder algorithm
> ------------------------------------------
>
> Key: CALCITE-6846
> URL: https://issues.apache.org/jira/browse/CALCITE-6846
> Project: Calcite
> Issue Type: New Feature
> Components: core
> Affects Versions: 1.38.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.39.0
>
>
> Supports the basic dphyp join reorder algorithm.
> For example :
> {code:java}
> SELECT
> i_item_id
> FROM store_sales, customer_demographics, date_dim, item, promotion
> WHERE ss_sold_date_sk = d_date_sk AND
> ss_item_sk = i_item_sk AND
> ss_cdemo_sk = cd_demo_sk AND
> ss_promo_sk = p_promo_sk {code}
> The plan tree after pushing down filter :
> {code:java}
> LogicalProject(i_item_id=[$61])
> LogicalJoin(condition=[=($7, $82)], joinType=[inner])
> LogicalJoin(condition=[=($1, $60)], joinType=[inner])
> LogicalJoin(condition=[=($22, $32)], joinType=[inner])
> LogicalJoin(condition=[=($3, $23)], joinType=[inner])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalTableScan(table=[[tpcds, customer_demographics]])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalTableScan(table=[[tpcds, promotion]]){code}
> Convert Joins into one HyperGraph :
> {code:java}
> LogicalProject(i_item_id=[$61])
>
> HyperGraph(edges=[{0}——INNER——{1},{0}——INNER——{2},{0}——INNER——{3},{0}——INNER——{4}])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalTableScan(table=[[tpcds, customer_demographics]])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalTableScan(table=[[tpcds, promotion]]) {code}
> After dphyp join reorder (with trimming fields and pushing down Project), the
> plan is :
> {code:java}
> LogicalProject(i_item_id=[$1])
> LogicalJoin(condition=[=($0, $2)], joinType=[inner])
> LogicalProject(ss_cdemo_sk=[$0], i_item_id=[$2])
> LogicalJoin(condition=[=($1, $3)], joinType=[inner])
> LogicalProject(ss_cdemo_sk=[$1], ss_sold_date_sk=[$2], i_item_id=[$4])
> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
> LogicalProject(ss_item_sk=[$0], ss_cdemo_sk=[$1],
> ss_sold_date_sk=[$3])
> LogicalJoin(condition=[=($2, $4)], joinType=[inner])
> LogicalProject(ss_item_sk=[$1], ss_cdemo_sk=[$3],
> ss_promo_sk=[$7], ss_sold_date_sk=[$22])
> LogicalTableScan(table=[[tpcds, store_sales]])
> LogicalProject(p_promo_sk=[$0])
> LogicalTableScan(table=[[tpcds, promotion]])
> LogicalProject(i_item_sk=[$0], i_item_id=[$1])
> LogicalTableScan(table=[[tpcds, item]])
> LogicalProject(d_date_sk=[$0])
> LogicalTableScan(table=[[tpcds, date_dim]])
> LogicalProject(cd_demo_sk=[$0])
> LogicalTableScan(table=[[tpcds, customer_demographics]]) {code}
> The main enumeration process of dphyp will be implemented in pr. However, it
> only can process inner join for now and the simplification of hypergraph has
> not yet been implemented.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)