[ 
https://issues.apache.org/jira/browse/CALCITE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035750#comment-18035750
 ] 

Silun Dong edited comment on CALCITE-7031 at 11/6/25 6:15 AM:
--------------------------------------------------------------

I try to implement the general decorrelation algorithm based on the paper and 
submitted a draft for observation of some typical test cases.
The content includes:
 # Add MARK JoinRelType.
 # Allows generating a mark type Correlate during removing subquery.(Refers to 
paper The Complete Story of Joins (in HyPer)).
 # Implements most of the functions of the general decorrelation algorithm. 
(Refers to paper Improving Unnesting of Complex Queries).
 # Support simplifying mark join to semi/anti join.

Overall, I think the algorithm in the paper may have the following advantages 
compared to current decorrelation:
 # Add mark join type based on the paper The Complete Story of Joins (in HyPer).
 # Decorrelation process supports handling more operator (such as SetOp).
 # Can handle deep nestings of correlated subqueries.
 # The algorithm is a complete top-down process that does not require rewriting 
using RBO.
 # There is a clear logic for simplifying mark join to semi/anti join.

This is currently just a draft, but it can provide a more intuitive reference 
for discussion. If it can be further developed, there is still some work to be 
done: adding mark join to the enumerable, considering the mark type in places 
involving join optimization, further improving the algorithm based on the paper 
(potentially eliminating inner joins to D based on the selectivity), etc.


was (Author: JIRAUSER308615):
I try to implement the general decorrelation algorithm based on the paper and 
submitted a draft for observation of some typical test cases.
The content includes: # 
Add MARK JoinRelType.
 # 
Allows generating a mark type Correlate during removing subquery.(Refers to 
paper The Complete Story of Joins (in HyPer)).
 # 
Implements most of the functions of the general decorrelation algorithm. 
(Refers to paper Improving Unnesting of Complex Queries).
 # 
Support simplifying mark join to semi/anti join.

Overall, I think the algorithm in the paper may have the following advantages 
compared to current decorrelation: # 
Add mark join type based on the paper The Complete Story of Joins (in HyPer).
 # 
Decorrelation process supports handling more operator (such as SetOp).
 # 
Can handle deep nestings of correlated subqueries.
 # 
The algorithm is a complete top-down process that does not require rewriting 
using RBO.
 # 
There is a clear logic for simplifying mark join to semi/anti join.

This is currently just a draft, but it can provide a more intuitive reference 
for discussion. If it can be further developed, there is still some work to be 
done: adding mark join to the enumerable, considering the mark type in places 
involving join optimization, further improving the algorithm based on the paper 
(potentially eliminating inner joins to D based on the selectivity), etc.

> Implement the general decorrelation algorithm (Neumann & Kemper)
> ----------------------------------------------------------------
>
>                 Key: CALCITE-7031
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7031
>             Project: Calcite
>          Issue Type: Wish
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Mihai Budiu
>            Priority: Minor
>
> Today Calcite uses a heuristic decorrelator which looks for some specific 
> patterns and replaces them with equivalent ones. This can handle a restricted 
> set of queries.
> There is a general algorithm for doing this, at least for the standard 
> operators: 
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
> The algorithm is a series of rewrites rules, each of which pushes the 
> decorrelation operators down in the plan towards leaves, until they can be 
> eliminated. 
> It would be great if Calcite had an implementation of this algorithm. This 
> could be contributed in pieces, where each rewrite rule is a separate PR. I 
> think these rules could be easily implemented in the existing planner rewrite 
> framework.
> There are three design problems that I can foresee:
> 1) The plans generated by this algorithm are in general DAGs and not trees. I 
> think trees would work, but they have the potential to be much larger than 
> the corresponding DAGs (perhaps exponentially larger in the size of the 
> original plan). I understand that there is a Calcite operator for 
> representing DAGs; I don't know if that's the goal of the Spool operator - it 
> seems to imply some kind of materialization of the result. The most important 
> decision is how to represent DAG plans such that the rewriting framework 
> continues to operate correctly.
> 2) A second issue is that the rewrite rules in the paper use a restricted 
> form of relation D which has some nice properties (e.g., it is a set). I am 
> not sure how such information can be represented in a Calcite plan, but I 
> suspect this can be done.
> 3) Third, while the algorithm in the paper handles many SQL-like operators, 
> the Calcite IR is even richer, supporting operators like Window, Cubes, 
> Unnest, recursive queries, etc. I don't know how the algorithm would extend 
> to plans containing such operators. But even if it doesn't handle all such 
> operators, a general-purpose decorrelator would be a significant improvement 
> over the existing one.
> This project would also give us the chance to close many issues related to 
> the current decorrelator, some of which have been unsolved over many years.
> Please comment here if you are interested in this project. I think the most 
> important problem to address is no 1 above, the handling of DAGs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to