[
https://issues.apache.org/jira/browse/CALCITE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953326#comment-17953326
]
Silun Dong commented on CALCITE-7029:
-------------------------------------
Thanks for your comment. I think your suggestion makes sense. Related
discussions also took place in
[CALCITE-6846|https://issues.apache.org/jira/browse/CALCITE-6846], and the
conclusion at that time was to use the {{@Experimental}} annotation to mark
DpHyp-related code. Does the content of this PR need to be benchmarked with
DpHyp? If this is necessary, further discussion is needed on how to implement
it. Or create a new jira for the topic of CALCITE benchmark. Do other reviewers
have any suggestions?
> Support DPhyp to handle various join types
> ------------------------------------------
>
> Key: CALCITE-7029
> URL: https://issues.apache.org/jira/browse/CALCITE-7029
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.39.0
> Reporter: Silun Dong
> Assignee: Silun Dong
> Priority: Major
> Labels: pull-request-available
>
> Add conflict detection algorithm CD-C to DPhyp so that it can handle various
> join types.
> DpHyp algorithm is a join reorder algorithm based on dynamic programming. It
> can enumerate all possibilities of the query graph without duplication, and
> it can theoretically handle complex join predicates and various types of
> joins (outer, semi, anti, etc.).
> Calcite-6846 completed the basic DpHyp algorithm, defined the hypergraph
> structure and enumeration process. It can enumerate the query graph without
> duplication and handle complex join predicates, but is limited to inner join.
> The ability to handle various types of joins requires conflict detection.
> This pr implements the CD-C conflict detection algorithm based on the paper
> {_}On the correct and complete enumeration of the core search space{_}. The
> conflict detection algorithm does not change the enumeration process of
> DpHyp. It calculates the conflict rules for each join operator in the process
> of constructing the hypergraph from the plan tree, and verifies the
> applicability of csg-cmp according to the conflict rules when DpHyp
> enumerates csg-cmp.
>
> The following is an example of sql and the expected plan:
> sql
> {code:java}
> select emp.empno from
> emp_address inner join emp on emp_address.empno = emp.empno
> left join dept on emp.deptno = dept.deptno
> inner join dept_nested on dept.deptno = dept_nested.deptno {code}
> initial plan
> {code:java}
> LogicalProject(EMPNO=[$3])
> LogicalJoin(condition=[=($12, $14)], joinType=[inner])
> LogicalJoin(condition=[=($10, $12)], joinType=[left])
> LogicalJoin(condition=[=($0, $3)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> build hypergraph
> {code:java}
> LogicalProject(EMPNO=[$3])
> HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0),
> vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7),
> vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0),
> vertex(3)_field(0))]——{3}])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> after dphyp
> {code:java}
> LogicalProject(EMPNO=[$4])
> LogicalJoin(condition=[=($15, $4)], joinType=[inner])
> LogicalJoin(condition=[=($13, $0)], joinType=[inner])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
> LogicalJoin(condition=[=($7, $9)], joinType=[left])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)