[
https://issues.apache.org/jira/browse/CALCITE-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18028975#comment-18028975
]
Ruben Q L commented on CALCITE-5716:
------------------------------------
Ok [~mbudiu], I gave it another try here:
https://github.com/apache/calcite/pull/4574
So the fix in {{SubQueryRemoveRule#matchFilter}} is clear and the unit tests
added into sub-query.iq pass fine with it.
We were reluctant to apply this patch originally, because it could "break"
downstream projects' plans that were working fine so far (although these plans
were not 100% correct, and contained a latent issue, that this patch would
unveil). We can see this in the PR too:
RelOptRuleTest#testCorrelatedFilterWithoutVariable would not work any more with
the patch. Notice that, in fact, the plan in this test was "incorrect", since
it created a filter with a Correlated Variable on its predicate, without
defining it on the variable set when creating the filter via RelBuilder. The PR
contains also the fix for this test, which is simply "declaring" the variable
while creating the filter.
We can decide to move forward, fix the original problem described in this
ticket, and if any downstream project suffers from the
"testCorrelatedFilterWithoutVariable issue" after upgrading, they'd need to fix
their plans (which were technically incorrect, and have been working so far a
bit "by chance").
> Two level nested correlated subquery translates to incorrect ON condition
> -------------------------------------------------------------------------
>
> Key: CALCITE-5716
> URL: https://issues.apache.org/jira/browse/CALCITE-5716
> Project: Calcite
> Issue Type: Improvement
> Reporter: John Wright
> Priority: Major
> Labels: pull-request-available
>
> I have a query that goes through the Calcite Core decorrelation rules and
> comes out with an ON condition that isn't correct and returns incorrect
> results to a different engine.
>
> {code:java}
> SELECT summary.id,
> (SELECT Max(detail.id)
> FROM detail
> WHERE date = (SELECT Max(date) AS maxDate
> FROM detail detail2
> WHERE detail2.summary_id = summary.id)
> AND detail.summary_id = summary.id) AS Detail_With_MaxDate,
> summary.data
> FROM summary {code}
> translates to:
> {code:java}
> select summary.ID AS id, t3.SINGLE_DETAIL AS Detail_With_MaxDate,
> summary.data AS data
> from summary
> left join (select t2.summary_id, MAX(t2.ID) AS SINGLE_DETAIL
> FROM (select t.summary_id, t.ID
> FROM (select *
> FROM detail
> WHERE summary_id IS NOT NULL) AS t
> INNER JOIN (select summary_id, MAX(date) AS maxDate
> FROM detail
> WHERE summary_id IS NOT NULL
> GROUP BY summary_id) AS t1
> ON t.ID = t1.summary_id
> WHERE t.date = t1.maxDate) AS t2
> INNER JOIN summary AS summary0 ON t2.summary_id = summary0.ID
> GROUP BY t2.summary_id) AS t3
> ON summary.ID = t3.summary_id {code}
> where the condition `ON t.ID = t1.summary_id` doesn't make any sense.
> [http://sqlfiddle.com/#!9/5758a6/3]
> includes testdata and the incorrect results (null for the one column).
> When I manually decorrelate the query I can get the correct answer:
> [http://sqlfiddle.com/#!9/5758a6/6]
> {code:java}
> SELECT summary.id,
> extraction.single_detail AS Detail_With_MaxDate,
> summary.data
> FROM summary
> JOIN (SELECT detail.summary_id,
> Max(detail.id) AS SINGLE_DETAIL
> FROM detail
> JOIN (SELECT summary_id,
> Max(date) AS maxDate
> FROM detail
> GROUP BY summary_id) maxDates
> ON detail.summary_id = maxDates.summary_id
> WHERE detail.date = maxDates.maxdate
> GROUP BY detail.summary_id) extraction
> ON summary.id = extraction.summary_id {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)