Here’s the logical plan:
Project(a.*)
Filter(condition: “notIn IS TRUE")
GeneralizedSemiJoin(condition=‘left.a=right.b’, computes={notIn: “!= ALL”})
Scan(foo)
Scan(bar)
The “notIn” column is a 3-valued boolean column computed for each record from
the left input.
GeneralizedSemiJoin would be able to compute columns such as “!= ALL”, “!=
ANY”, “< ALL”, etc. Each returns TRUE/FALSE/UNKOWN.
There may be multiple such columns.
A physical implementation would be able to apply filters, e.g. "notIn IS TRUE
OR notIn IS UNKNOWN”. And therefore potentially return fewer rows, or convert
itself to an ordinary inner semi-join.
> On Jul 24, 2020, at 10:05 PM, Haisheng Yuan <[email protected]> wrote:
>
> I am not sure I get your idea.
> What will the logical plan and physical plan look like for the following
> query?
> SELECT * FROM foo WHERE a NOT IN (SELECT b FROM bar); -- bar.b is nullable
>
>
> On 2020/07/23 01:35:44, Julian Hyde <[email protected]> wrote:
>> How about a semi-join algorithm that adds column that hold the nature of the
>> match? This algorithm can be used to evaluate 3-valued IN and 3-valued NOT
>> IN queries.
>>
>> Generalizing further, it could compute any “ANY (predicate)” or “ALL
>> (predicate)” condition as a column with values TRUE/FALSE/UNKNOWN. For
>> instance “empno NOT IN …” is equivalent to “ALL(empno <> …)”.
>>
>> Here is an example.
>>
>> Emp table:
>>
>> empno deptno mgr
>> ===== ====== ====
>> 100 10 NULL
>> 101 10 100
>> 102 20 101
>> 103 30 100
>> 104 30 103
>> 105 40 NULL
>>
>>
>> The semi-join algorithm calculates the following intermediate result SJ:
>>
>> empno deptno mgrs match anyNulls allNulls
>> ===== ====== =========== ===== ======== ========
>> 100 10 [NULL, 100] TRUE TRUE FALSE
>> 101 10 [NULL, 100] FALSE TRUE FALSE
>> 102 20 [101] FALSE FALSE FALSE
>> 103 30 [100, 103] TRUE FALSE FALSE
>> 104 30 [100, 103] FALSE FALSE TRUE
>> 105 40 [NULL] FALSE TRUE TRUE
>>
>> So it can answer the following query:
>>
>> SELECT empno,
>> deptno,
>> empno IN (SELECT mgr FROM Emp
>> WHERE deptno = e.deptno) AS “in",
>> empno NOT IN (SELECT mgr FROM Emp
>> WHERE deptno = e.deptno) AS “notIn"
>> FROM Emp AS e;
>>
>> empno deptno in notIn
>> ===== ====== ======= =====
>> 100 10 TRUE FALSE
>> 101 10 FALSE UNKNOWN
>> 102 20 FALSE TRUE
>> 103 30 TRUE FALSE
>> 104 40 FALSE FALSE
>> 105 50 UNKNOWN UNKNOWN
>>
>> by mapping it to the output of the semi-join algorithm:
>>
>> SELECT empno, deptno,
>> case when allNulls then unknown else match end AS “in”,
>> case when anyNulls then unknown else not match end AS “notIn”
>> FROM SJ
>>
>>
>>
>>
>>
>>
>>
>>
>>> On Jul 22, 2020, at 8:57 AM, Vladimir Sitnikov
>>> <[email protected]> wrote:
>>>
>>> Julian> Vladimir said he *expected* Oracle would implement (3-valued) NOT
>>> IN efficiently. (Back in the day, when I was at Oracle, they certainly did
>>> not.) Does anyone have any evidence that they do?
>>>
>>> Well, Oracle has "null aware" joins since Oracle 11g which is more than 10
>>> years old.
>>> I have not tested the actual performance of the `null-aware join`, however,
>>> it would be extremely surprising, if `null-aware join` was less efficient
>>> than "manually crafted aggregate + join + join + whatever".
>>>
>>> That is why I think it is important to be able to generate regular "not in"
>>> plans.
>>>
>>> Vladimir
>>
>>