I’m trying to figure out how to attack the NOT IN bug, 
https://issues.apache.org/jira/browse/OPTIQ-373  (I thought I’d addressed it — 
see JdbcTest.testNotInQueryWithNull — but I wrote the test case wrong, and we 
do indeed have a problem.)

I’d like to translate NOT IN similarly to how we translate IN, EXISTS, NOT 
EXISTS and scalar sub-queries (i.e. to some combination of a join, probably an 
outer join, on top of an aggregate). Then have a rule that recognizes that 
pattern and translates to SemiJoin.

That seems more roundabout than going straight to semi-join, but (1) it will 
allow NOT IN queries to run on back-ends that do not have an implementation of 
SemiJoin, and (2) it should allow us to do decorrelation, (3) it should allow 
us to represent cases that cannot be represented as semi-join, e.g. ‘(x NOT IN 
subquery) IS UNKNOWN OR otherCondition'

There are a bunch of ways to translate the query where you first check then 
whether the sub-query returns NULL then, then run again using 2-valued boolean 
semantics.

For example, the query

SELECT * FROM d
WHERE d NOT IN (
  SELECT d FROM e)

can be translated to

SELECT * FROM d
WHERE NOT EXISTS (SELECT 1 FROM e WHERE deptno IS NULL)
AND deptno IN (SELECT deptno FROM e WHERE deptno IS NOT NULL)

But in so doing, we have duplicated the sub-query. That places a big burden on 
the rule that would recognize the two identical sub-queries and convert to some 
variant of semi-join.

So, is there a translation that only uses one copy of the query?

Julian

Reply via email to