[jira] [Commented] (CALCITE-7031) Implement the general decorrelation algorithm (Neumann & Kemper)

Mihai Budiu (Jira) Thu, 06 Nov 2025 09:20:09 -0800


    [ 
https://issues.apache.org/jira/browse/CALCITE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18035905#comment-18035905
 ]


Mihai Budiu commented on CALCITE-7031:
--------------------------------------

The paper talks about a "single join", which seems to help in decorrelation as 
well.
Or is the existing SINGLE_VALUE aggregate sufficient?
It seems to be that an aggregate makes the plan harder to optimize.


> Implement the general decorrelation algorithm (Neumann & Kemper)
> ----------------------------------------------------------------
>
>                 Key: CALCITE-7031
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7031
>             Project: Calcite
>          Issue Type: Wish
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Mihai Budiu
>            Priority: Minor
>              Labels: pull-request-available
>
> Today Calcite uses a heuristic decorrelator which looks for some specific 
> patterns and replaces them with equivalent ones. This can handle a restricted 
> set of queries.
> There is a general algorithm for doing this, at least for the standard 
> operators: 
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
> The algorithm is a series of rewrites rules, each of which pushes the 
> decorrelation operators down in the plan towards leaves, until they can be 
> eliminated. 
> It would be great if Calcite had an implementation of this algorithm. This 
> could be contributed in pieces, where each rewrite rule is a separate PR. I 
> think these rules could be easily implemented in the existing planner rewrite 
> framework.
> There are three design problems that I can foresee:
> 1) The plans generated by this algorithm are in general DAGs and not trees. I 
> think trees would work, but they have the potential to be much larger than 
> the corresponding DAGs (perhaps exponentially larger in the size of the 
> original plan). I understand that there is a Calcite operator for 
> representing DAGs; I don't know if that's the goal of the Spool operator - it 
> seems to imply some kind of materialization of the result. The most important 
> decision is how to represent DAG plans such that the rewriting framework 
> continues to operate correctly.
> 2) A second issue is that the rewrite rules in the paper use a restricted 
> form of relation D which has some nice properties (e.g., it is a set). I am 
> not sure how such information can be represented in a Calcite plan, but I 
> suspect this can be done.
> 3) Third, while the algorithm in the paper handles many SQL-like operators, 
> the Calcite IR is even richer, supporting operators like Window, Cubes, 
> Unnest, recursive queries, etc. I don't know how the algorithm would extend 
> to plans containing such operators. But even if it doesn't handle all such 
> operators, a general-purpose decorrelator would be a significant improvement 
> over the existing one.
> This project would also give us the chance to close many issues related to 
> the current decorrelator, some of which have been unsolved over many years.
> Please comment here if you are interested in this project. I think the most 
> important problem to address is no 1 above, the handling of DAGs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7031) Implement the general decorrelation algorithm (Neumann & Kemper)

Reply via email to