[
https://issues.apache.org/jira/browse/CALCITE-6629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julian Hyde updated CALCITE-6629:
---------------------------------
Description:
Partitioned outer join is a feature, generalizing outer join, that generates
missing rows so that there is at least one row for each value of a partitioning
column. For example,
{noformat}
select d.dname, d.deptno, e.job, e.ename
from dept d
left join emp e
partition by (e.job)
on e.deptno = d.deptno
order by 2;
DNAME DEPTNO JOB ENAME
========== ====== ========= ======
ACCOUNTING 10 PRESIDENT KING
ACCOUNTING 10 CLERK MILLER
ACCOUNTING 10 MANAGER CLARK
ACCOUNTING 10 SALESMAN -
ACCOUNTING 10 ANALYST -
RESEARCH 20 MANAGER JONES
RESEARCH 20 CLERK SMITH
RESEARCH 20 CLERK ADAMS
RESEARCH 20 SALESMAN -
RESEARCH 20 ANALYST FORD
RESEARCH 20 ANALYST SCOTT
RESEARCH 20 PRESIDENT -
SALES 30 CLERK JAMES
SALES 30 ANALYST -
SALES 30 MANAGER BLAKE
SALES 30 SALESMAN MARTIN
SALES 30 SALESMAN ALLEN
SALES 30 SALESMAN WARD
SALES 30 SALESMAN TURNER
SALES 30 PRESIDENT -
OPERATIONS 40 CLERK -
OPERATIONS 40 PRESIDENT -
OPERATIONS 40 ANALYST -
OPERATIONS 40 SALESMAN -
OPERATIONS 40 MANAGER - {noformat}
Note that there are 5 additional rows for department 40 (one for each job) but
also additional rows for the other departments.
The following query reverses the tables and uses a right join. It returns
exactly the same results as the previous query. Note that "{{{}partition
by{}}}" occurs before "{{{}right join{}}}".
{noformat}
select d.dname, d.deptno, e.job, e.ename
from emp e
partition by (e.job)
right join dept d
on e.deptno = d.deptno
order by 2;
{noformat}
If the non-null-generating side is a subset, the partition nevertheless takes
the values of the whole table. For example the following returns a row for each
job, even though there is no SALESMAN or ANALYST in department 10:
{noformat}
select d.dname, d.deptno, e.job, e.ename
from (select * from dept where deptno = 10) d
left join emp e
partition by (e.job)
on e.deptno = d.deptno;
DNAME DEPTNO JOB ENAME
========== ====== ========= ======
ACCOUNTING 10 PRESIDENT KING
ACCOUNTING 10 CLERK MILLER
ACCOUNTING 10 MANAGER CLARK
ACCOUNTING 10 SALESMAN -
ACCOUNTING 10 ANALYST -
{noformat}
Null can be a partitioning value (i.e. semantics are the same as {{GROUP BY}}):
{noformat}
select d.dname, d.deptno, e.job, e.ename, e.d1
from dept d
left join (select emp.*, nullif(deptno, 20) as d1 from emp) e
partition by (e.d1)
on e.deptno = d.deptno
order by 2;
DNAME DEPTNO JOB ENAME D1
========== ====== ========= ====== ==
ACCOUNTING 10 PRESIDENT KING 10
ACCOUNTING 10 CLERK MILLER 10
ACCOUNTING 10 MANAGER CLARK 10
ACCOUNTING 10 - - 30
ACCOUNTING 10 - - -
RESEARCH 20 MANAGER JONES 20
RESEARCH 20 CLERK SMITH 20
RESEARCH 20 CLERK ADAMS 20
RESEARCH 20 ANALYST FORD 20
RESEARCH 20 ANALYST SCOTT 20
RESEARCH 20 - - 10
RESEARCH 20 - - 30
SALES 30 CLERK JAMES 30
SALES 30 MANAGER BLAKE 30
SALES 30 SALESMAN MARTIN 30
SALES 30 SALESMAN ALLEN 30
SALES 30 SALESMAN WARD 30
SALES 30 SALESMAN TURNER 30
SALES 30 - - 10
SALES 30 - - -
OPERATIONS 40 - - 10
OPERATIONS 40 - - 30
OPERATIONS 40 - - -{noformat}
was:
Partitioned outer join is a feature, generalizing outer join, that generates
missing rows so that there is at least one row for each value of a paritioning
column. For example,
{noformat}
select d.dname, d.deptno, e.job, e.ename
from dept d
left join emp e
partition by (e.job)
on e.deptno = d.deptno
order by 2;
DNAME DEPTNO JOB ENAME
========== ====== ========= ======
ACCOUNTING 10 PRESIDENT KING
ACCOUNTING 10 CLERK MILLER
ACCOUNTING 10 MANAGER CLARK
ACCOUNTING 10 SALESMAN -
ACCOUNTING 10 ANALYST -
RESEARCH 20 MANAGER JONES
RESEARCH 20 CLERK SMITH
RESEARCH 20 CLERK ADAMS
RESEARCH 20 SALESMAN -
RESEARCH 20 ANALYST FORD
RESEARCH 20 ANALYST SCOTT
RESEARCH 20 PRESIDENT -
SALES 30 CLERK JAMES
SALES 30 ANALYST -
SALES 30 MANAGER BLAKE
SALES 30 SALESMAN MARTIN
SALES 30 SALESMAN ALLEN
SALES 30 SALESMAN WARD
SALES 30 SALESMAN TURNER
SALES 30 PRESIDENT -
OPERATIONS 40 CLERK -
OPERATIONS 40 PRESIDENT -
OPERATIONS 40 ANALYST -
OPERATIONS 40 SALESMAN -
OPERATIONS 40 MANAGER - {noformat}
Note that there are 5 additional rows for department 40 (one for each job) but
also additional rows for the other departments.
The following query reverses the tables and uses a right join. It returns
exactly the same results as the previous query. Note that "{{partition by}}"
occurs before "{{right join}}".
{noformat}
select d.dname, d.deptno, e.job, e.ename
from emp e
partition by (e.job)
right join dept d
on e.deptno = d.deptno
order by 2;
{noformat}
> Partitioned outer join (LEFT JOIN ... PARTITION BY and PARTITION BY ... RIGHT
> JOIN)
> -----------------------------------------------------------------------------------
>
> Key: CALCITE-6629
> URL: https://issues.apache.org/jira/browse/CALCITE-6629
> Project: Calcite
> Issue Type: Bug
> Reporter: Julian Hyde
> Priority: Major
>
> Partitioned outer join is a feature, generalizing outer join, that generates
> missing rows so that there is at least one row for each value of a
> partitioning column. For example,
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from dept d
> left join emp e
> partition by (e.job)
> on e.deptno = d.deptno
> order by 2;
> DNAME DEPTNO JOB ENAME
> ========== ====== ========= ======
> ACCOUNTING 10 PRESIDENT KING
> ACCOUNTING 10 CLERK MILLER
> ACCOUNTING 10 MANAGER CLARK
> ACCOUNTING 10 SALESMAN -
> ACCOUNTING 10 ANALYST -
> RESEARCH 20 MANAGER JONES
> RESEARCH 20 CLERK SMITH
> RESEARCH 20 CLERK ADAMS
> RESEARCH 20 SALESMAN -
> RESEARCH 20 ANALYST FORD
> RESEARCH 20 ANALYST SCOTT
> RESEARCH 20 PRESIDENT -
> SALES 30 CLERK JAMES
> SALES 30 ANALYST -
> SALES 30 MANAGER BLAKE
> SALES 30 SALESMAN MARTIN
> SALES 30 SALESMAN ALLEN
> SALES 30 SALESMAN WARD
> SALES 30 SALESMAN TURNER
> SALES 30 PRESIDENT -
> OPERATIONS 40 CLERK -
> OPERATIONS 40 PRESIDENT -
> OPERATIONS 40 ANALYST -
> OPERATIONS 40 SALESMAN -
> OPERATIONS 40 MANAGER - {noformat}
> Note that there are 5 additional rows for department 40 (one for each job)
> but also additional rows for the other departments.
> The following query reverses the tables and uses a right join. It returns
> exactly the same results as the previous query. Note that "{{{}partition
> by{}}}" occurs before "{{{}right join{}}}".
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from emp e
> partition by (e.job)
> right join dept d
> on e.deptno = d.deptno
> order by 2;
> {noformat}
> If the non-null-generating side is a subset, the partition nevertheless takes
> the values of the whole table. For example the following returns a row for
> each job, even though there is no SALESMAN or ANALYST in department 10:
> {noformat}
> select d.dname, d.deptno, e.job, e.ename
> from (select * from dept where deptno = 10) d
> left join emp e
> partition by (e.job)
> on e.deptno = d.deptno;
> DNAME DEPTNO JOB ENAME
> ========== ====== ========= ======
> ACCOUNTING 10 PRESIDENT KING
> ACCOUNTING 10 CLERK MILLER
> ACCOUNTING 10 MANAGER CLARK
> ACCOUNTING 10 SALESMAN -
> ACCOUNTING 10 ANALYST -
> {noformat}
> Null can be a partitioning value (i.e. semantics are the same as {{GROUP
> BY}}):
> {noformat}
> select d.dname, d.deptno, e.job, e.ename, e.d1
> from dept d
> left join (select emp.*, nullif(deptno, 20) as d1 from emp) e
> partition by (e.d1)
> on e.deptno = d.deptno
> order by 2;
> DNAME DEPTNO JOB ENAME D1
> ========== ====== ========= ====== ==
> ACCOUNTING 10 PRESIDENT KING 10
> ACCOUNTING 10 CLERK MILLER 10
> ACCOUNTING 10 MANAGER CLARK 10
> ACCOUNTING 10 - - 30
> ACCOUNTING 10 - - -
> RESEARCH 20 MANAGER JONES 20
> RESEARCH 20 CLERK SMITH 20
> RESEARCH 20 CLERK ADAMS 20
> RESEARCH 20 ANALYST FORD 20
> RESEARCH 20 ANALYST SCOTT 20
> RESEARCH 20 - - 10
> RESEARCH 20 - - 30
> SALES 30 CLERK JAMES 30
> SALES 30 MANAGER BLAKE 30
> SALES 30 SALESMAN MARTIN 30
> SALES 30 SALESMAN ALLEN 30
> SALES 30 SALESMAN WARD 30
> SALES 30 SALESMAN TURNER 30
> SALES 30 - - 10
> SALES 30 - - -
> OPERATIONS 40 - - 10
> OPERATIONS 40 - - 30
> OPERATIONS 40 - - -{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)