[
https://issues.apache.org/jira/browse/CALCITE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944279#comment-17944279
]
Zhen Chen edited comment on CALCITE-6948 at 4/14/25 8:37 AM:
-------------------------------------------------------------
Hi [~julianhyde]
My apologies if my previous response wasn't sufficiently clear. Let me rephrase
the explanation, please correct me if any part remains inaccurate.
1. The SQL description I've added above, and it might not properly reflect this
rule's behavior.
SQL like:
from
{code:java}
select empid from emps
except
select empid from emps where empid > 150;{code}
to
{code:java}
SELECT empid
FROM emps A
WHERE NOT EXISTS (
SELECT 1
FROM emps B
WHERE B.empid > 150 and A.empid is not distinct from B.empid
); {code}
2. Only EXCEPT is currently supported; EXCEPT ALL is not yet implemented. Here
it only deleted one 100, I don't have a good way to support EXCEPT ALL.
{code:java}
CREATE TABLE emps (
empid int,
deptno int,
name varchar(255),
salary int,
commission int
);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps VALUES (200, 20, 'Eric', 8000, 500);
CREATE TABLE emps2 (
empid int,
deptno int,
name varchar(255),
salary int,
commission int
);
INSERT INTO emps2 VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps2 VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps2 VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps2 VALUES (200, 20, 'Eric', 8000, 500);
select empid from emps
except
select empid from emps2;
-- rerurns
-- empid
-- -------
-- (0 rows)
select empid from emps
except all
select empid from emps2;
-- rerurns
-- empid
-- -------
-- 100
-- 100
-- (2 rows) {code}
3. 2-way Minus. I think the rule supports repeated application on plans. While
Minus normally takes exactly two children if Minus have not been merged. When
this rule is applied multiple times to the plan, every Minus operator will be
optimized by the rule on each application. we can implement n-way Minus support
when required.
4. Converting IS NOT DISTINCT FROM to '=', I actually implemented this but
encountered issues with nested type nullability. After discussing with Mihai,
we agreed to keep IS NOT DISTINCT FROM and let other rules/logic handle the
conversion. If you recommend implementing it in this rule, I can modify it
accordingly.
was (Author: jensen):
Hi [~julianhyde]
My apologies if my previous response wasn't sufficiently clear. Let me rephrase
the explanation, please correct me if any part remains inaccurate.
1. The SQL description I've added above, and it might not properly reflect this
rule's behavior.
SQL like:
from
{code:java}
select empid from emps
except
select empid from emps where empid > 150;{code}
to
{code:java}
SELECT empid
FROM emps A
WHERE NOT EXISTS (
SELECT 1
FROM emps B
WHERE B.empid > 150 and A.empid is not distinct from B.empid
); {code}
2. Only EXCEPT is currently supported; EXCEPT ALL is not yet implemented. Here
it only deleted one 100, I don't have a good way to support EXCEPT ALL.
{code:java}
CREATE TABLE emps (
empid int,
deptno int,
name varchar(255),
salary int,
commission int
);INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps VALUES (200, 20, 'Eric', 8000, 500);CREATE TABLE emps2 (
empid int,
deptno int,
name varchar(255),
salary int,
commission int
);INSERT INTO emps2 VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps2 VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps2 VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps2 VALUES (200, 20, 'Eric', 8000, 500);select empid from emps
except
select empid from emps2;
-- rerurns
-- empid
-- -------
-- (0 rows)select empid from emps
except all
select empid from emps2;
-- rerurns
-- empid
-- -------
-- 100
-- 100
-- (2 rows) {code}
3. 2-way Minus. I think the rule supports repeated application on plans. While
Minus normally takes exactly two children if Minus have not been merged. When
this rule is applied multiple times to the plan, every Minus operator will be
optimized by the rule on each application. we can implement n-way Minus support
when required.
4. Converting IS NOT DISTINCT FROM to '=', I actually implemented this but
encountered issues with nested type nullability. After discussing with Mihai,
we agreed to keep IS NOT DISTINCT FROM and let other rules/logic handle the
conversion. If you recommend implementing it in this rule, I can modify it
accordingly.
> Implement MinusToAntiJoinRule
> -----------------------------
>
> Key: CALCITE-6948
> URL: https://issues.apache.org/jira/browse/CALCITE-6948
> Project: Calcite
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.39.0
> Reporter: Zhen Chen
> Assignee: Zhen Chen
> Priority: Major
> Labels: pull-request-available
> Fix For: 1.40.0
>
>
> Similar to IntersectToSemiJoinRule, we can implement MinusToAntiJoinRule.
> Like this:
> from:
> {code:java}
> LogicalMinus(all=[false])
> LogicalProject(ENAME=[$1])
> LogicalFilter(condition=[=($7, 10)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(ENAME=[$1])
> LogicalFilter(condition=[=($7, 20)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> to:
> {code:java}
> LogicalAggregate(group=[{0}])
> LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[anti])
> LogicalProject(ENAME=[$1])
> LogicalFilter(condition=[=($7, 10)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]])
> LogicalProject(ENAME=[$1])
> LogicalFilter(condition=[=($7, 20)])
> LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> SQL like:
> from
>
> {code:java}
> select empid from emps
> except
> select empid from emps where empid > 150;{code}
> to
>
> {code:java}
> SELECT empid
> FROM emps A
> WHERE NOT EXISTS (
> SELECT 1
> FROM emps B
> WHERE B.empid > 150 and A.empid is not distinct from B.empid
> ); {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)