[
https://issues.apache.org/jira/browse/CALCITE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035905#comment-18035905
]
Mihai Budiu commented on CALCITE-7031:
--------------------------------------
The paper talks about a "single join", which seems to help in decorrelation as
well.
Or is the existing SINGLE_VALUE aggregate sufficient?
It seems to be that an aggregate makes the plan harder to optimize.
> Implement the general decorrelation algorithm (Neumann & Kemper)
> ----------------------------------------------------------------
>
> Key: CALCITE-7031
> URL: https://issues.apache.org/jira/browse/CALCITE-7031
> Project: Calcite
> Issue Type: Wish
> Components: core
> Affects Versions: 1.39.0
> Reporter: Mihai Budiu
> Priority: Minor
> Labels: pull-request-available
>
> Today Calcite uses a heuristic decorrelator which looks for some specific
> patterns and replaces them with equivalent ones. This can handle a restricted
> set of queries.
> There is a general algorithm for doing this, at least for the standard
> operators:
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
> The algorithm is a series of rewrites rules, each of which pushes the
> decorrelation operators down in the plan towards leaves, until they can be
> eliminated.
> It would be great if Calcite had an implementation of this algorithm. This
> could be contributed in pieces, where each rewrite rule is a separate PR. I
> think these rules could be easily implemented in the existing planner rewrite
> framework.
> There are three design problems that I can foresee:
> 1) The plans generated by this algorithm are in general DAGs and not trees. I
> think trees would work, but they have the potential to be much larger than
> the corresponding DAGs (perhaps exponentially larger in the size of the
> original plan). I understand that there is a Calcite operator for
> representing DAGs; I don't know if that's the goal of the Spool operator - it
> seems to imply some kind of materialization of the result. The most important
> decision is how to represent DAG plans such that the rewriting framework
> continues to operate correctly.
> 2) A second issue is that the rewrite rules in the paper use a restricted
> form of relation D which has some nice properties (e.g., it is a set). I am
> not sure how such information can be represented in a Calcite plan, but I
> suspect this can be done.
> 3) Third, while the algorithm in the paper handles many SQL-like operators,
> the Calcite IR is even richer, supporting operators like Window, Cubes,
> Unnest, recursive queries, etc. I don't know how the algorithm would extend
> to plans containing such operators. But even if it doesn't handle all such
> operators, a general-purpose decorrelator would be a significant improvement
> over the existing one.
> This project would also give us the chance to close many issues related to
> the current decorrelator, some of which have been unsolved over many years.
> Please comment here if you are interested in this project. I think the most
> important problem to address is no 1 above, the handling of DAGs.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)