[jira] [Commented] (CALCITE-7031) Implement the general decorrelation algorithm (Neumann & Kemper)

Stamatis Zampetakis (Jira) Wed, 21 May 2025 00:31:04 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-7031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953072#comment-17953072
 ]


Stamatis Zampetakis commented on CALCITE-7031:
----------------------------------------------

I agree with [~suibianwanwan33]. The RelDecorrelator has some heuristic parts 
but overall it resembles the algorithm described by Neumann & Kemper.

[~mbudiu] Note that DAGs is not a must have for the algorithm to work. I think 
Calcite work around this by somehow "duplicating" a sub-tree. If I recall well 
this logic is inside the 
[RelDecorrelator#createValueGenerator|https://github.com/apache/calcite/blob/1d4b1fd6e9e6953f43ee5c985b3e4aa10ce568e1/core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java#L1012]
 method. The distinct set D is probably created [in that 
method|https://github.com/apache/calcite/blob/1d4b1fd6e9e6953f43ee5c985b3e4aa10ce568e1/core/src/main/java/org/apache/calcite/sql2rel/RelDecorrelator.java#L1065]
 as well.

I am not against coming up with an implementation that is closer to the paper 
but I think we we need to understand what exactly are we missing from that 
paper.

> Implement the general decorrelation algorithm (Neumann & Kemper)
> ----------------------------------------------------------------
>
>                 Key: CALCITE-7031
>                 URL: https://issues.apache.org/jira/browse/CALCITE-7031
>             Project: Calcite
>          Issue Type: Wish
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Mihai Budiu
>            Priority: Minor
>
> Today Calcite uses a heuristic decorrelator which looks for some specific 
> patterns and replaces them with equivalent ones. This can handle a restricted 
> set of queries.
> There is a general algorithm for doing this, at least for the standard 
> operators: 
> https://github.com/lonng/db-papers/blob/main/papers/nested-query/unnesting-arbitrary-queries.pdf
> The algorithm is a series of rewrites rules, each of which pushes the 
> decorrelation operators down in the plan towards leaves, until they can be 
> eliminated. 
> It would be great if Calcite had an implementation of this algorithm. This 
> could be contributed in pieces, where each rewrite rule is a separate PR. I 
> think these rules could be easily implemented in the existing planner rewrite 
> framework.
> There are three design problems that I can foresee:
> 1) The plans generated by this algorithm are in general DAGs and not trees. I 
> think trees would work, but they have the potential to be much larger than 
> the corresponding DAGs (perhaps exponentially larger in the size of the 
> original plan). I understand that there is a Calcite operator for 
> representing DAGs; I don't know if that's the goal of the Spool operator - it 
> seems to imply some kind of materialization of the result. The most important 
> decision is how to represent DAG plans such that the rewriting framework 
> continues to operate correctly.
> 2) A second issue is that the rewrite rules in the paper use a restricted 
> form of relation D which has some nice properties (e.g., it is a set). I am 
> not sure how such information can be represented in a Calcite plan, but I 
> suspect this can be done.
> 3) Third, while the algorithm in the paper handles many SQL-like operators, 
> the Calcite IR is even richer, supporting operators like Window, Cubes, 
> Unnest, recursive queries, etc. I don't know how the algorithm would extend 
> to plans containing such operators. But even if it doesn't handle all such 
> operators, a general-purpose decorrelator would be a significant improvement 
> over the existing one.
> This project would also give us the chance to close many issues related to 
> the current decorrelator, some of which have been unsolved over many years.
> Please comment here if you are interested in this project. I think the most 
> important problem to address is no 1 above, the handling of DAGs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Commented] (CALCITE-7031) Implement the general decorrelation algorithm (Neumann & Kemper)

Reply via email to