[
https://issues.apache.org/jira/browse/CALCITE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035750#comment-18035750
]
Silun Dong edited comment on CALCITE-7031 at 11/6/25 6:15 AM:
--------------------------------------------------------------
I try to implement the general decorrelation algorithm based on the paper and
submitted a draft for observation of some typical test cases.
The content includes:
# Add MARK JoinRelType.
# Allows generating a mark type Correlate during removing subquery.(Refers to
paper The Complete Story of Joins (in HyPer)).
# Implements most of the functions of the general decorrelation algorithm.
(Refers to paper Improving Unnesting of Complex Queries).
# Support simplifying mark join to semi/anti join.
Overall, I think the algorithm in the paper may have the following advantages
compared to current decorrelation:
# Add mark join type based on the paper The Complete Story of Joins (in HyPer).
# Decorrelation process supports handling more operator (such as SetOp).
# Can handle deep nestings of correlated subqueries.
# The algorithm is a complete top-down process that does not require rewriting
using RBO.
# There is a clear logic for simplifying mark join to semi/anti join.
This is currently just a draft, but it can provide a more intuitive reference
for discussion. If it can be further developed, there is still some work to be
done: adding mark join to the enumerable, considering the mark type in places
involving join optimization, further improving the algorithm based on the paper
(potentially eliminating inner joins to D based on the selectivity), etc.
was (Author: JIRAUSER308615):
I try to implement the general decorrelation algorithm based on the paper and
submitted a draft for observation of some typical test cases.
The content includes: #
Add MARK JoinRelType.
#
Allows generating a mark type Correlate during removing subquery.(Refers to
paper The Complete Story of Joins (in HyPer)).
#
Implements most of the functions of the general decorrelation algorithm.
(Refers to paper Improving Unnesting of Complex Queries).
#
Support simplifying mark join to semi/anti join.
Overall, I think the algorithm in the paper may have the following advantages
compared to current decorrelation: #
Add mark join type based on the paper The Complete Story of Joins (in HyPer).
#
Decorrelation process supports handling more operator (such as SetOp).
#
Can handle deep nestings of correlated subqueries.
#
The algorithm is a complete top-down process that does not require rewriting
using RBO.
#
There is a clear logic for simplifying mark join to semi/anti join.
This is currently just a draft, but it can provide a more intuitive reference
for discussion. If it can be further developed, there is still some work to be
done: adding mark join to the enumerable, considering the mark type in places
involving join optimization, further improving the algorithm based on the paper
(potentially eliminating inner joins to D based on the selectivity), etc.
> Implement the general decorrelation algorithm (Neumann & Kemper)
> ----------------------------------------------------------------
>
> Key: CALCITE-7031
> URL: https://issues.apache.org/jira/browse/CALCITE-7031
> Project: Calcite
> Issue Type: Wish
> Components: core
> Affects Versions: 1.39.0
> Reporter: Mihai Budiu
> Priority: Minor
>
> Today Calcite uses a heuristic decorrelator which looks for some specific
> patterns and replaces them with equivalent ones. This can handle a restricted
> set of queries.
> There is a general algorithm for doing this, at least for the standard
> operators:
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
> The algorithm is a series of rewrites rules, each of which pushes the
> decorrelation operators down in the plan towards leaves, until they can be
> eliminated.
> It would be great if Calcite had an implementation of this algorithm. This
> could be contributed in pieces, where each rewrite rule is a separate PR. I
> think these rules could be easily implemented in the existing planner rewrite
> framework.
> There are three design problems that I can foresee:
> 1) The plans generated by this algorithm are in general DAGs and not trees. I
> think trees would work, but they have the potential to be much larger than
> the corresponding DAGs (perhaps exponentially larger in the size of the
> original plan). I understand that there is a Calcite operator for
> representing DAGs; I don't know if that's the goal of the Spool operator - it
> seems to imply some kind of materialization of the result. The most important
> decision is how to represent DAG plans such that the rewriting framework
> continues to operate correctly.
> 2) A second issue is that the rewrite rules in the paper use a restricted
> form of relation D which has some nice properties (e.g., it is a set). I am
> not sure how such information can be represented in a Calcite plan, but I
> suspect this can be done.
> 3) Third, while the algorithm in the paper handles many SQL-like operators,
> the Calcite IR is even richer, supporting operators like Window, Cubes,
> Unnest, recursive queries, etc. I don't know how the algorithm would extend
> to plans containing such operators. But even if it doesn't handle all such
> operators, a general-purpose decorrelator would be a significant improvement
> over the existing one.
> This project would also give us the chance to close many issues related to
> the current decorrelator, some of which have been unsolved over many years.
> Please comment here if you are interested in this project. I think the most
> important problem to address is no 1 above, the handling of DAGs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)