[
https://issues.apache.org/jira/browse/CALCITE-5716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17737204#comment-17737204
]
Ruben Q L edited comment on CALCITE-5716 at 6/26/23 4:15 PM:
-------------------------------------------------------------
After some discussion between [~libenchao] and myself, it was decided that, due
to potential regression risks, CALCITE-5683 will NOT include a fix for the
current issue (they are two separate, related problems), so I'm re-opening the
current ticket.
The current issue could be reproduced by adding the following tests in
sub-query.iq:
{code:sql}
select deptno from dept d1 where exists (
select 1 from dept d2 where d2.deptno = d1.deptno and exists (
select 1 from dept d3 where d3.dname = d1.dname));
+--------+
| DEPTNO |
+--------+
| 10 |
| 20 |
| 30 |
| 40 |
+--------+
(4 rows)
!ok
SELECT dept.deptno, (
SELECT max(emp.empno)
FROM emp
WHERE empno = (SELECT max(empno) AS maxDept
FROM emp e2
WHERE e2.deptno = dept.deptno)
AND emp.deptno = dept.deptno),
dept.dname
FROM dept;
+--------+--------+------------+
| DEPTNO | EXPR$1 | DNAME |
+--------+--------+------------+
| 10 | 7934 | ACCOUNTING |
| 20 | 7902 | RESEARCH |
| 30 | 7900 | SALES |
| 40 | | OPERATIONS |
+--------+--------+------------+
(4 rows)
!ok
{code}
It seems the fix could simply be changing the following line in
{{SubQueryRemoveRule#matchFilter}}:
{code:java}
if (!filterVariablesSet.isEmpty()) {
variablesSet.retainAll(filterVariablesSet);
}
==>
variablesSet.retainAll(filterVariablesSet);
{code}
However, that could break downstream projects which may be creating Filter
without correctly specifying their set of variables (see more info
[here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and
[here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so
for the moment (given the limited time before 1.35RC) we decided to NOT apply
it. This needs further discussion.
was (Author: rubenql):
After some discussion between [~libenchao] and myself, it was decided that, due
to potential regression risks, CALCITE-5683 will NOT include a fix for the
current issue (they are two separate, related problems), so I'm re-opening the
current ticket.
The current issue could be reproduced by adding the following tests in
sub-query.iq:
{code:sql}
select deptno from dept d1 where exists (
select 1 from dept d2 where d2.deptno = d1.deptno and exists (
select 1 from dept d3 where d3.dname = d1.dname));
+--------+
| DEPTNO |
+--------+
| 10 |
| 20 |
| 30 |
| 40 |
+--------+
(4 rows)
!ok
SELECT dept.deptno, (
SELECT max(emp.empno)
FROM emp
WHERE empno = (SELECT max(empno) AS maxDept
FROM emp e2
WHERE e2.deptno = dept.deptno)
AND emp.deptno = dept.deptno),
dept.dname
FROM dept;
+--------+--------+------------+
| DEPTNO | EXPR$1 | DNAME |
+--------+--------+------------+
| 10 | 7934 | ACCOUNTING |
| 20 | 7902 | RESEARCH |
| 30 | 7900 | SALES |
| 40 | | OPERATIONS |
+--------+--------+------------+
(4 rows)
!ok
{code}
The fix could simply be changing the following line in
{{SubQueryRemoveRule#matchFilter}}:
{code:java}
if (!filterVariablesSet.isEmpty()) {
variablesSet.retainAll(filterVariablesSet);
}
==>
variablesSet.retainAll(filterVariablesSet);
{code}
However, that could break downstream projects which may be creating Filter
without correctly specifying their set of variables (see more info
[here|https://github.com/apache/calcite/pull/3277#discussion_r1240592799] and
[here|https://github.com/apache/calcite/pull/3193#discussion_r1204993079]); so
for the moment (given the limited time before 1.35RC) we decided to NOT apply
it. This needs further discussion.
> Two level nested correlated subquery translates to incorrect ON condition
> -------------------------------------------------------------------------
>
> Key: CALCITE-5716
> URL: https://issues.apache.org/jira/browse/CALCITE-5716
> Project: Calcite
> Issue Type: Improvement
> Reporter: John Wright
> Priority: Major
>
> I have a query that goes through the Calcite Core decorrelation rules and
> comes out with an ON condition that isn't correct and returns incorrect
> results to a different engine.
>
> {code:java}
> SELECT summary.id,
> (SELECT Max(detail.id)
> FROM detail
> WHERE date = (SELECT Max(date) AS maxDate
> FROM detail detail2
> WHERE detail2.summary_id = summary.id)
> AND detail.summary_id = summary.id) AS Detail_With_MaxDate,
> summary.data
> FROM summary {code}
> translates to:
> {code:java}
> select summary.ID AS id, t3.SINGLE_DETAIL AS Detail_With_MaxDate,
> summary.data AS data
> from summary
> left join (select t2.summary_id, MAX(t2.ID) AS SINGLE_DETAIL
> FROM (select t.summary_id, t.ID
> FROM (select *
> FROM detail
> WHERE summary_id IS NOT NULL) AS t
> INNER JOIN (select summary_id, MAX(date) AS maxDate
> FROM detail
> WHERE summary_id IS NOT NULL
> GROUP BY summary_id) AS t1
> ON t.ID = t1.summary_id
> WHERE t.date = t1.maxDate) AS t2
> INNER JOIN summary AS summary0 ON t2.summary_id = summary0.ID
> GROUP BY t2.summary_id) AS t3
> ON summary.ID = t3.summary_id {code}
> where the condition `ON t.ID = t1.summary_id` doesn't make any sense.
> [http://sqlfiddle.com/#!9/5758a6/3]
> includes testdata and the incorrect results (null for the one column).
> When I manually decorrelate the query I can get the correct answer:
> [http://sqlfiddle.com/#!9/5758a6/6]
> {code:java}
> SELECT summary.id,
> extraction.single_detail AS Detail_With_MaxDate,
> summary.data
> FROM summary
> JOIN (SELECT detail.summary_id,
> Max(detail.id) AS SINGLE_DETAIL
> FROM detail
> JOIN (SELECT summary_id,
> Max(date) AS maxDate
> FROM detail
> GROUP BY summary_id) maxDates
> ON detail.summary_id = maxDates.summary_id
> WHERE detail.date = maxDates.maxdate
> GROUP BY detail.summary_id) extraction
> ON summary.id = extraction.summary_id {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)