[jira] [Commented] (CALCITE-7029) Support DPhyp to handle various join types

Alessandro Solimando (Jira) Thu, 22 May 2025 05:12:52 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953428#comment-17953428
 ]


Alessandro Solimando commented on CALCITE-7029:
-----------------------------------------------

[~xuzifu666], ClickHouse is a full-fledged database, while Calcite is not, it 
is missing notably a storage layer. Calcite has an execution engine 
(_Enumerable_) but it aims more at providing a way to validate plans with 
actual results then else.

You should clarify what you mean by benchmark in this case, if it is planning + 
execution like some other DBs have, I am not sure that is achievable/desirable 
for Calcite, in its current form.

What was suggested for CALCITE-6846, as [~dongsl] says, refers to the join 
enumeration itself at the planning level, to see if the algorithm (or better, 
its implementation) can indeed beat what we currently have. DpHyp is still 
marked as experimental, so I guess we can add this extra feature without such 
benchmark.

For later, my feeling is that we should try and have a mechanism to compare 
join ordering algorithms we have, which would allows us to integrate more 
features in this complex area with more confidence, and compare alternatives 
effectively.

> Support DPhyp to handle various join types
> ------------------------------------------
>
>                 Key: CALCITE-7029
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7029
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Silun Dong
>            Assignee: Silun Dong
>            Priority: Major
>              Labels: pull-request-available
>
> Add conflict detection algorithm CD-C to DPhyp so that it can handle various 
> join types. 
> DpHyp algorithm is a join reorder algorithm based on dynamic programming. It 
> can enumerate all possibilities of the query graph without duplication, and 
> it can theoretically handle complex join predicates and various types of 
> joins (outer, semi, anti, etc.).
> Calcite-6846 completed the basic DpHyp algorithm, defined the hypergraph 
> structure and enumeration process. It can enumerate the query graph without 
> duplication and handle complex join predicates, but is limited to inner join. 
> The ability to handle various types of joins requires conflict detection.
> This pr implements the CD-C conflict detection algorithm based on the paper 
> {_}On the correct and complete enumeration of the core search space{_}. The 
> conflict detection algorithm does not change the enumeration process of 
> DpHyp. It calculates the conflict rules for each join operator in the process 
> of constructing the hypergraph from the plan tree, and verifies the 
> applicability of csg-cmp according to the conflict rules when DpHyp 
> enumerates csg-cmp.
>  
> The following is an example of sql and the expected plan:
> sql
> {code:java}
> select emp.empno from 
> emp_address inner join emp on emp_address.empno = emp.empno 
> left join dept on emp.deptno = dept.deptno 
> inner join dept_nested on dept.deptno = dept_nested.deptno {code}
> initial plan
> {code:java}
> LogicalProject(EMPNO=[$3])
>   LogicalJoin(condition=[=($12, $14)], joinType=[inner])
>     LogicalJoin(condition=[=($10, $12)], joinType=[left])
>       LogicalJoin(condition=[=($0, $3)], joinType=[inner])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> build hypergraph
> {code:java}
> LogicalProject(EMPNO=[$3])
>   HyperGraph(edges=[{0}——[INNER, =(vertex(0)_field(0), 
> vertex(1)_field(0))]——{1},{1}——[LEFT, =(vertex(1)_field(7), 
> vertex(2)_field(0))]——{2},{1, 2}——[INNER, =(vertex(2)_field(0), 
> vertex(3)_field(0))]——{3}])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]]) {code}
> after dphyp
> {code:java}
> LogicalProject(EMPNO=[$4])
>   LogicalJoin(condition=[=($15, $4)], joinType=[inner])
>     LogicalJoin(condition=[=($13, $0)], joinType=[inner])
>       LogicalTableScan(table=[[CATALOG, SALES, DEPT_NESTED]])
>       LogicalJoin(condition=[=($7, $9)], joinType=[left])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>         LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
>     LogicalTableScan(table=[[CATALOG, SALES, EMP_ADDRESS]]) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7029) Support DPhyp to handle various join types

Reply via email to