[jira] [Comment Edited] (CALCITE-6948) Implement MinusToAntiJoinRule

Zhen Chen (Jira) Mon, 14 Apr 2025 01:39:58 -0700


    [ 
https://issues.apache.org/jira/browse/CALCITE-6948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17944279#comment-17944279
 ]


Zhen Chen edited comment on CALCITE-6948 at 4/14/25 8:37 AM:
-------------------------------------------------------------

Hi [~julianhyde] 

My apologies if my previous response wasn't sufficiently clear. Let me rephrase 
the explanation, please correct me if any part remains inaccurate.

1. The SQL description I've added above, and it might not properly reflect this 
rule's behavior.

SQL like:

from
{code:java}
select empid from emps
except
select empid from emps where empid > 150;{code}
to
{code:java}
SELECT empid
FROM emps A
WHERE NOT EXISTS (
    SELECT 1
    FROM emps B
    WHERE B.empid > 150 and A.empid is not distinct from B.empid
); {code}
2. Only EXCEPT is currently supported; EXCEPT ALL is not yet implemented. Here 
it only deleted one 100, I don't have a good way to support EXCEPT ALL.
{code:java}
CREATE TABLE emps (
    empid int,
    deptno int,
    name varchar(255),
    salary int,
    commission int
);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps VALUES (200, 20, 'Eric', 8000, 500);

CREATE TABLE emps2 (
    empid int,
    deptno int,
    name varchar(255),
    salary int,
    commission int
);
INSERT INTO emps2 VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps2 VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps2 VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps2 VALUES (200, 20, 'Eric', 8000, 500);

select empid from emps
except
select empid from emps2;
-- rerurns
--   empid 
--  -------
--  (0 rows)

select empid from emps
except all
select empid from emps2;
-- rerurns
--   empid 
--  -------
--   100
--   100
-- (2 rows) {code}
3. 2-way Minus. I think the rule supports repeated application on plans. While 
Minus normally takes exactly two children if Minus have not been merged. When 
this rule is applied multiple times to the plan, every Minus operator will be 
optimized by the rule on each application. we can implement n-way Minus support 
when required.

4. Converting IS NOT DISTINCT FROM to '=', I actually implemented this but 
encountered issues with nested type nullability. After discussing with Mihai, 
we agreed to keep IS NOT DISTINCT FROM  and let other rules/logic handle the 
conversion. If you recommend implementing it in this rule, I can modify it 
accordingly.


was (Author: jensen):
Hi [~julianhyde] 

My apologies if my previous response wasn't sufficiently clear. Let me rephrase 
the explanation, please correct me if any part remains inaccurate.

1. The SQL description I've added above, and it might not properly reflect this 
rule's behavior.

SQL like:

from
{code:java}
select empid from emps
except
select empid from emps where empid > 150;{code}
to
{code:java}
SELECT empid
FROM emps A
WHERE NOT EXISTS (
    SELECT 1
    FROM emps B
    WHERE B.empid > 150 and A.empid is not distinct from B.empid
); {code}
2. Only EXCEPT is currently supported; EXCEPT ALL is not yet implemented. Here 
it only deleted one 100, I don't have a good way to support EXCEPT ALL.

 
{code:java}
CREATE TABLE emps (
    empid int,
    deptno int,
    name varchar(255),
    salary int,
    commission int
);INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps VALUES (200, 20, 'Eric', 8000, 500);CREATE TABLE emps2 (
    empid int,
    deptno int,
    name varchar(255),
    salary int,
    commission int
);INSERT INTO emps2 VALUES (100, 10, 'Bill', 10000, 1000);
INSERT INTO emps2 VALUES (110, 10, 'Theodore', 11500, 250);
INSERT INTO emps2 VALUES (150, 10, 'Sebastian', 7000, null);
INSERT INTO emps2 VALUES (200, 20, 'Eric', 8000, 500);select empid from emps
except
select empid from emps2;
-- rerurns
--   empid 
--  -------
--  (0 rows)select empid from emps
except all
select empid from emps2;
-- rerurns
--   empid 
--  -------
--   100
--   100
-- (2 rows) {code}
3. 2-way Minus. I think the rule supports repeated application on plans. While 
Minus normally takes exactly two children if Minus have not been merged. When 
this rule is applied multiple times to the plan, every Minus operator will be 
optimized by the rule on each application. we can implement n-way Minus support 
when required.

4. Converting IS NOT DISTINCT FROM to '=', I actually implemented this but 
encountered issues with nested type nullability. After discussing with Mihai, 
we agreed to keep IS NOT DISTINCT FROM  and let other rules/logic handle the 
conversion. If you recommend implementing it in this rule, I can modify it 
accordingly.

 

 

 

> Implement MinusToAntiJoinRule
> -----------------------------
>
>                 Key: CALCITE-6948
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6948
>             Project: Calcite
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.39.0
>            Reporter: Zhen Chen
>            Assignee: Zhen Chen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 1.40.0
>
>
> Similar to IntersectToSemiJoinRule, we can implement MinusToAntiJoinRule. 
> Like this:
> from:
> {code:java}
> LogicalMinus(all=[false])
>   LogicalProject(ENAME=[$1])
>     LogicalFilter(condition=[=($7, 10)])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>   LogicalProject(ENAME=[$1])
>     LogicalFilter(condition=[=($7, 20)])
>       LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> to:
> {code:java}
> LogicalAggregate(group=[{0}])
>   LogicalJoin(condition=[IS NOT DISTINCT FROM($0, $1)], joinType=[anti])
>     LogicalProject(ENAME=[$1])
>       LogicalFilter(condition=[=($7, 10)])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]])
>     LogicalProject(ENAME=[$1])
>       LogicalFilter(condition=[=($7, 20)])
>         LogicalTableScan(table=[[CATALOG, SALES, EMP]]) {code}
> SQL like:
> from
>  
> {code:java}
> select empid from emps
> except
> select empid from emps where empid > 150;{code}
> to
>  
> {code:java}
> SELECT empid
> FROM emps A
> WHERE NOT EXISTS (
>     SELECT 1
>     FROM emps B
>     WHERE B.empid > 150 and A.empid is not distinct from B.empid
> ); {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Comment Edited] (CALCITE-6948) Implement MinusToAntiJoinRule

Reply via email to