[jira] [Updated] (CALCITE-6629) Partitioned outer join (LEFT JOIN ... PARTITION BY and PARTITION BY ... RIGHT JOIN)

Julian Hyde (Jira) Fri, 18 Oct 2024 10:43:27 -0700


     [ 
https://issues.apache.org/jira/browse/CALCITE-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Julian Hyde updated CALCITE-6629:
---------------------------------
    Description: 
Partitioned outer join is a feature, generalizing outer join, that generates 
missing rows so that there is at least one row for each value of a partitioning 
column. For example,
{noformat}
select d.dname, d.deptno, e.job, e.ename
from dept d
left join emp e
  partition by (e.job)
  on e.deptno = d.deptno
order by 2;

DNAME      DEPTNO JOB       ENAME
========== ====== ========= ======
ACCOUNTING     10 PRESIDENT KING
ACCOUNTING     10 CLERK     MILLER
ACCOUNTING     10 MANAGER   CLARK
ACCOUNTING     10 SALESMAN  -
ACCOUNTING     10 ANALYST   -
RESEARCH       20 MANAGER   JONES
RESEARCH       20 CLERK     SMITH
RESEARCH       20 CLERK     ADAMS
RESEARCH       20 SALESMAN  -
RESEARCH       20 ANALYST   FORD
RESEARCH       20 ANALYST   SCOTT
RESEARCH       20 PRESIDENT -
SALES          30 CLERK     JAMES
SALES          30 ANALYST   -
SALES          30 MANAGER   BLAKE
SALES          30 SALESMAN  MARTIN
SALES          30 SALESMAN  ALLEN
SALES          30 SALESMAN  WARD
SALES          30 SALESMAN  TURNER
SALES          30 PRESIDENT -
OPERATIONS     40 CLERK     -
OPERATIONS     40 PRESIDENT -
OPERATIONS     40 ANALYST   -
OPERATIONS     40 SALESMAN  -
OPERATIONS     40 MANAGER   - {noformat}
Note that there are 5 additional rows for department 40 (one for each job) but 
also additional rows for the other departments.

The following query reverses the tables and uses a right join. It returns 
exactly the same results as the previous query. Note that "{{{}partition 
by{}}}" occurs before "{{{}right join{}}}".
{noformat}
select d.dname, d.deptno, e.job, e.ename
from emp e
  partition by (e.job)
right join dept d
  on e.deptno = d.deptno
order by 2;
{noformat}
If the non-null-generating side is a subset, the partition nevertheless takes 
the values of the whole table. For example the following returns a row for each 
job, even though there is no SALESMAN or ANALYST in department 10:
{noformat}
select d.dname, d.deptno, e.job, e.ename
from (select * from dept where deptno = 10) d
left join emp e
    partition by (e.job)
    on e.deptno = d.deptno;

DNAME      DEPTNO JOB       ENAME
========== ====== ========= ======
ACCOUNTING     10 PRESIDENT KING
ACCOUNTING     10 CLERK     MILLER
ACCOUNTING     10 MANAGER   CLARK
ACCOUNTING     10 SALESMAN  -
ACCOUNTING     10 ANALYST   -
{noformat}
Null can be a partitioning value (i.e. semantics are the same as {{GROUP BY}}):
{noformat}
select d.dname, d.deptno, e.job, e.ename, e.d1
from dept d
left join (select emp.*, nullif(deptno, 20) as d1 from emp) e
  partition by (e.d1)
  on e.deptno = d.deptno
order by 2;

DNAME      DEPTNO JOB       ENAME  D1
========== ====== ========= ====== ==
ACCOUNTING     10 PRESIDENT KING   10
ACCOUNTING     10 CLERK     MILLER 10
ACCOUNTING     10 MANAGER   CLARK  10
ACCOUNTING     10 -         -      30
ACCOUNTING     10 -         -       -
RESEARCH       20 MANAGER   JONES  20
RESEARCH       20 CLERK     SMITH  20
RESEARCH       20 CLERK     ADAMS  20
RESEARCH       20 ANALYST   FORD   20
RESEARCH       20 ANALYST   SCOTT  20
RESEARCH       20 -         -      10
RESEARCH       20 -         -      30
SALES          30 CLERK     JAMES  30
SALES          30 MANAGER   BLAKE  30
SALES          30 SALESMAN  MARTIN 30
SALES          30 SALESMAN  ALLEN  30
SALES          30 SALESMAN  WARD   30
SALES          30 SALESMAN  TURNER 30
SALES          30 -         -      10
SALES          30 -         -       -
OPERATIONS     40 -         -      10
OPERATIONS     40 -         -      30
OPERATIONS     40 -         -       -{noformat}


  was:
Partitioned outer join is a feature, generalizing outer join, that generates 
missing rows so that there is at least one row for each value of a paritioning 
column. For example,
{noformat}
select d.dname, d.deptno, e.job, e.ename
from dept d
left join emp e
    partition by (e.job)
    on e.deptno = d.deptno
order by 2;

DNAME      DEPTNO JOB       ENAME
========== ====== ========= ======
ACCOUNTING     10 PRESIDENT KING
ACCOUNTING     10 CLERK     MILLER
ACCOUNTING     10 MANAGER   CLARK
ACCOUNTING     10 SALESMAN  - 
ACCOUNTING     10 ANALYST   - 
RESEARCH       20 MANAGER   JONES
RESEARCH       20 CLERK     SMITH
RESEARCH       20 CLERK     ADAMS
RESEARCH       20 SALESMAN  - 
RESEARCH       20 ANALYST   FORD
RESEARCH       20 ANALYST   SCOTT
RESEARCH       20 PRESIDENT - 
SALES          30 CLERK     JAMES
SALES          30 ANALYST   - 
SALES          30 MANAGER   BLAKE
SALES          30 SALESMAN  MARTIN
SALES          30 SALESMAN  ALLEN
SALES          30 SALESMAN  WARD
SALES          30 SALESMAN  TURNER
SALES          30 PRESIDENT - 
OPERATIONS     40 CLERK     - 
OPERATIONS     40 PRESIDENT - 
OPERATIONS     40 ANALYST   - 
OPERATIONS     40 SALESMAN  - 
OPERATIONS     40 MANAGER   - {noformat}

Note that there are 5 additional rows for department 40 (one for each job) but 
also additional rows for the other departments.

The following query reverses the tables and uses a right join. It returns 
exactly the same results as the previous query. Note that "{{partition by}}" 
occurs before "{{right join}}". 
{noformat}
select d.dname, d.deptno, e.job, e.ename
from emp e
  partition by (e.job) 
right join dept d
    on e.deptno = d.deptno
order by 2;
{noformat}


> Partitioned outer join (LEFT JOIN ... PARTITION BY and PARTITION BY ... RIGHT 
> JOIN)
> -----------------------------------------------------------------------------------
>
>                 Key: CALCITE-6629
>                 URL: https://issues.apache.org/jira/browse/CALCITE-6629
>             Project: Calcite
>          Issue Type: Bug
>            Reporter: Julian Hyde
>            Priority: Major
>
> Partitioned outer join is a feature, generalizing outer join, that generates 
> missing rows so that there is at least one row for each value of a 
> partitioning column. For example,
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from dept d
> left join emp e
>   partition by (e.job)
>   on e.deptno = d.deptno
> order by 2;
> DNAME      DEPTNO JOB       ENAME
> ========== ====== ========= ======
> ACCOUNTING     10 PRESIDENT KING
> ACCOUNTING     10 CLERK     MILLER
> ACCOUNTING     10 MANAGER   CLARK
> ACCOUNTING     10 SALESMAN  -
> ACCOUNTING     10 ANALYST   -
> RESEARCH       20 MANAGER   JONES
> RESEARCH       20 CLERK     SMITH
> RESEARCH       20 CLERK     ADAMS
> RESEARCH       20 SALESMAN  -
> RESEARCH       20 ANALYST   FORD
> RESEARCH       20 ANALYST   SCOTT
> RESEARCH       20 PRESIDENT -
> SALES          30 CLERK     JAMES
> SALES          30 ANALYST   -
> SALES          30 MANAGER   BLAKE
> SALES          30 SALESMAN  MARTIN
> SALES          30 SALESMAN  ALLEN
> SALES          30 SALESMAN  WARD
> SALES          30 SALESMAN  TURNER
> SALES          30 PRESIDENT -
> OPERATIONS     40 CLERK     -
> OPERATIONS     40 PRESIDENT -
> OPERATIONS     40 ANALYST   -
> OPERATIONS     40 SALESMAN  -
> OPERATIONS     40 MANAGER   - {noformat}
> Note that there are 5 additional rows for department 40 (one for each job) 
> but also additional rows for the other departments.
> The following query reverses the tables and uses a right join. It returns 
> exactly the same results as the previous query. Note that "{{{}partition 
> by{}}}" occurs before "{{{}right join{}}}".
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from emp e
>   partition by (e.job)
> right join dept d
>   on e.deptno = d.deptno
> order by 2;
> {noformat}
> If the non-null-generating side is a subset, the partition nevertheless takes 
> the values of the whole table. For example the following returns a row for 
> each job, even though there is no SALESMAN or ANALYST in department 10:
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from (select * from dept where deptno = 10) d
> left join emp e
>     partition by (e.job)
>     on e.deptno = d.deptno;
> DNAME      DEPTNO JOB       ENAME
> ========== ====== ========= ======
> ACCOUNTING     10 PRESIDENT KING
> ACCOUNTING     10 CLERK     MILLER
> ACCOUNTING     10 MANAGER   CLARK
> ACCOUNTING     10 SALESMAN  -
> ACCOUNTING     10 ANALYST   -
> {noformat}
> Null can be a partitioning value (i.e. semantics are the same as {{GROUP 
> BY}}):
> {noformat}
> select d.dname, d.deptno, e.job, e.ename, e.d1
> from dept d
> left join (select emp.*, nullif(deptno, 20) as d1 from emp) e
>   partition by (e.d1)
>   on e.deptno = d.deptno
> order by 2;
> DNAME      DEPTNO JOB       ENAME  D1
> ========== ====== ========= ====== ==
> ACCOUNTING     10 PRESIDENT KING   10
> ACCOUNTING     10 CLERK     MILLER 10
> ACCOUNTING     10 MANAGER   CLARK  10
> ACCOUNTING     10 -         -      30
> ACCOUNTING     10 -         -       -
> RESEARCH       20 MANAGER   JONES  20
> RESEARCH       20 CLERK     SMITH  20
> RESEARCH       20 CLERK     ADAMS  20
> RESEARCH       20 ANALYST   FORD   20
> RESEARCH       20 ANALYST   SCOTT  20
> RESEARCH       20 -         -      10
> RESEARCH       20 -         -      30
> SALES          30 CLERK     JAMES  30
> SALES          30 MANAGER   BLAKE  30
> SALES          30 SALESMAN  MARTIN 30
> SALES          30 SALESMAN  ALLEN  30
> SALES          30 SALESMAN  WARD   30
> SALES          30 SALESMAN  TURNER 30
> SALES          30 -         -      10
> SALES          30 -         -       -
> OPERATIONS     40 -         -      10
> OPERATIONS     40 -         -      30
> OPERATIONS     40 -         -       -{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CALCITE-6629) Partitioned outer join (LEFT JOIN ... PARTITION BY and PARTITION BY ... RIGHT JOIN)

Reply via email to