OK, so you think you know relational algebra...

I am working on a rewrite for IN[1] (and sub-queries in general) that
works as a RelNode-to-RelNode transform, not during SQL-to-Rel
conversion.

I need a general purpose rewrite for IN. It must work in the presence
of null values, and will produce 3-value logic results (true, false,
null).

Consider the query

select deptno,
 deptno in (select deptno from emp)
from dept;

and suppose that deptno may be null. Therefore we need to deal with
the cases like "null in (10, 20)", "10 in (null, 20)", "20 in ()", "30
in (30, 40)".

The best rewrite I can find is

select dept.deptno,
  case
  when ct.c = 0 then false
  when dt.i is not null then true
  when dept.deptno is null then null
  when ct.ck < ct.c then null
  else false
  end
from dept
cross join (select count(*) as c, count(deptno) as ck from emp) as ct
left join (select distinct deptno, true as i from emp) as dt
  on dept.deptno = dt.deptno;

(The above rewrite is already in Calcite, and is tested thoroughly in
subquery.oq.)

Can you do better?

It is incredible to me that it needs TWO passes over the emp table
(one to figure out whether there are any values, or any null values).
And this is an uncorrelated query, which is supposed to be simple!

Also, any ideas for using functions other than count, say something
like postgres' bool_or, that could quit when they find the first
matching value.

Given the general purpose rewrite, I think I can generalize it to
composite IN (IN with key arity > 1, "(x, y) IN ...") and EXISTS
(which is IN with key arity 0), and simplify it for where we can treat
a null result as false, or where we know the key is never null.

Julian

[1] https://issues.apache.org/jira/browse/CALCITE-816

Reply via email to