[ 
https://issues.apache.org/jira/browse/CALCITE-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028975#comment-18028975
 ] 

Ruben Q L commented on CALCITE-5716:
------------------------------------

Ok [~mbudiu], I gave it another try here: 
https://github.com/apache/calcite/pull/4574
So the fix in {{SubQueryRemoveRule#matchFilter}} is clear and the unit tests 
added into sub-query.iq pass fine with it.
We were reluctant to apply this patch originally, because it could "break" 
downstream projects' plans that were working fine so far (although these plans 
were not 100% correct, and contained a latent issue, that this patch would 
unveil). We can see this in the PR too: 
RelOptRuleTest#testCorrelatedFilterWithoutVariable would not work any more with 
the patch. Notice that, in fact, the plan in this test was "incorrect", since 
it created a filter with a Correlated Variable on its predicate, without 
defining it on the variable set when creating the filter via RelBuilder. The PR 
contains also the fix for this test, which is simply "declaring" the variable 
while creating the filter.
We can decide to move forward, fix the original problem described in this 
ticket, and if any downstream project suffers from the 
"testCorrelatedFilterWithoutVariable issue" after upgrading, they'd need to fix 
their plans (which were technically incorrect, and have been working so far a 
bit "by chance").

> Two level nested correlated subquery translates to incorrect ON condition
> -------------------------------------------------------------------------
>
>                 Key: CALCITE-5716
>                 URL: https://issues.apache.org/jira/browse/CALCITE-5716
>             Project: Calcite
>          Issue Type: Improvement
>            Reporter: John Wright
>            Priority: Major
>              Labels: pull-request-available
>
> I have a query that goes through the Calcite Core decorrelation rules and 
> comes out with an ON condition that isn't correct and returns incorrect 
> results to a different engine.
>  
> {code:java}
>  SELECT summary.id,
>        (SELECT Max(detail.id)
>         FROM   detail
>         WHERE  date = (SELECT Max(date) AS maxDate
>                        FROM   detail detail2
>                        WHERE  detail2.summary_id = summary.id)
>                AND detail.summary_id = summary.id) AS Detail_With_MaxDate,
>        summary.data
> FROM   summary  {code}
> translates to:
> {code:java}
> select summary.ID AS id, t3.SINGLE_DETAIL AS Detail_With_MaxDate, 
> summary.data AS data
> from summary
> left join (select t2.summary_id, MAX(t2.ID) AS SINGLE_DETAIL
>     FROM (select t.summary_id, t.ID
>         FROM (select *
>             FROM detail
>             WHERE summary_id IS NOT NULL) AS t
>             INNER JOIN (select summary_id, MAX(date) AS maxDate
>                         FROM detail
>                         WHERE summary_id IS NOT NULL
>                         GROUP BY summary_id) AS t1 
>             ON t.ID = t1.summary_id
>             WHERE t.date = t1.maxDate) AS t2
>     INNER JOIN summary AS summary0 ON t2.summary_id = summary0.ID
>     GROUP BY t2.summary_id) AS t3 
> ON summary.ID = t3.summary_id {code}
> where the condition `ON t.ID = t1.summary_id` doesn't make any sense.
> [http://sqlfiddle.com/#!9/5758a6/3]
> includes testdata and the incorrect results (null for the one column).
> When I manually decorrelate the query I can get the correct answer:
> [http://sqlfiddle.com/#!9/5758a6/6]
> {code:java}
> SELECT summary.id,
>        extraction.single_detail AS Detail_With_MaxDate,
>        summary.data
> FROM   summary
>        JOIN (SELECT detail.summary_id,
>                     Max(detail.id) AS SINGLE_DETAIL
>              FROM   detail
>                     JOIN (SELECT summary_id,
>                                  Max(date) AS maxDate
>                           FROM   detail
>                           GROUP  BY summary_id) maxDates
>                       ON detail.summary_id = maxDates.summary_id
>              WHERE  detail.date = maxDates.maxdate
>              GROUP  BY detail.summary_id) extraction
>          ON summary.id = extraction.summary_id  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to