OK, so you think you know relational algebra... I am working on a rewrite for IN[1] (and sub-queries in general) that works as a RelNode-to-RelNode transform, not during SQL-to-Rel conversion.
I need a general purpose rewrite for IN. It must work in the presence of null values, and will produce 3-value logic results (true, false, null). Consider the query select deptno, deptno in (select deptno from emp) from dept; and suppose that deptno may be null. Therefore we need to deal with the cases like "null in (10, 20)", "10 in (null, 20)", "20 in ()", "30 in (30, 40)". The best rewrite I can find is select dept.deptno, case when ct.c = 0 then false when dt.i is not null then true when dept.deptno is null then null when ct.ck < ct.c then null else false end from dept cross join (select count(*) as c, count(deptno) as ck from emp) as ct left join (select distinct deptno, true as i from emp) as dt on dept.deptno = dt.deptno; (The above rewrite is already in Calcite, and is tested thoroughly in subquery.oq.) Can you do better? It is incredible to me that it needs TWO passes over the emp table (one to figure out whether there are any values, or any null values). And this is an uncorrelated query, which is supposed to be simple! Also, any ideas for using functions other than count, say something like postgres' bool_or, that could quit when they find the first matching value. Given the general purpose rewrite, I think I can generalize it to composite IN (IN with key arity > 1, "(x, y) IN ...") and EXISTS (which is IN with key arity 0), and simplify it for where we can treat a null result as false, or where we know the key is never null. Julian [1] https://issues.apache.org/jira/browse/CALCITE-816
