> On Jul 28, 2015, at 4:06 PM, Vladimir Sitnikov <[email protected]> 
> wrote:
> 
>> But if you solve the general problem,
> 
> That is important. However, as we solve that "general" problem, we
> should take into account that "downstream engines" might support
> 3-valued logic.
> For instance, OracleDB has null-aware anti-join. In other words, "not
> in" just works yet still supports 3-valued logic behind the scenes.
> 
> My point #1 is:
> 1.1) When I approach IN/NOT IN kind of queries, I typically add NOT
> NULL everywhere. This solves 3-valued problem. As you add "not null"
> you identify the required handling of nulls in a particular query.
> 1.2) As Calcite focuses on query optimization, it should probably
> treat "the most typical cases" first. For instance, it shouldn't
> convert every join to cross join+count "just in case" there will be
> nulls.

I have to do both — run every query correctly, and run the common cases 
quickly. I am finding that I can optimize quite a lot, and easily, by starting 
from the general case and removing every “is not null” that I know at compile 
time will always evaluate to true.

There is a further optimization where EXISTS and IN turn into a semi-join. The 
EXISTS or IN expressions are completely removed from the WHERE clause - because 
if they had failed the row would have been removed already. That is a more 
difficult optimization to make.

Null-aware anti-join would be a nice feature to add. Hive has null-safe join 
also. The present framework could probably accommodate it, but I am not 
thinking about it right now.

There are also big gains to be had in decorrelation. I am not looking at 
decorrelation right now.

> 
>> So, Oracle is compliant with my understanding of the standard.
> 
> Thanks for taking time and explaining this.
> I just tried to provoke some conversation over a well-known
> hard-to-reason-about problem.
> I was inspired with your topic and tried a couple of
> never-ask-in-an-interview queries.

It would be great if you could add some of your nasty queries to subquery.oq. 
Test coverage will help us stay on the right path.

If you write a query and say “This ought to be done using a plan that uses just 
a semi-join” then I can either refute your assertion or implement the necessary 
planner rule.

Julian

Reply via email to