> On Jul 28, 2015, at 4:06 PM, Vladimir Sitnikov <[email protected]> > wrote: > >> But if you solve the general problem, > > That is important. However, as we solve that "general" problem, we > should take into account that "downstream engines" might support > 3-valued logic. > For instance, OracleDB has null-aware anti-join. In other words, "not > in" just works yet still supports 3-valued logic behind the scenes. > > My point #1 is: > 1.1) When I approach IN/NOT IN kind of queries, I typically add NOT > NULL everywhere. This solves 3-valued problem. As you add "not null" > you identify the required handling of nulls in a particular query. > 1.2) As Calcite focuses on query optimization, it should probably > treat "the most typical cases" first. For instance, it shouldn't > convert every join to cross join+count "just in case" there will be > nulls.
I have to do both — run every query correctly, and run the common cases quickly. I am finding that I can optimize quite a lot, and easily, by starting from the general case and removing every “is not null” that I know at compile time will always evaluate to true. There is a further optimization where EXISTS and IN turn into a semi-join. The EXISTS or IN expressions are completely removed from the WHERE clause - because if they had failed the row would have been removed already. That is a more difficult optimization to make. Null-aware anti-join would be a nice feature to add. Hive has null-safe join also. The present framework could probably accommodate it, but I am not thinking about it right now. There are also big gains to be had in decorrelation. I am not looking at decorrelation right now. > >> So, Oracle is compliant with my understanding of the standard. > > Thanks for taking time and explaining this. > I just tried to provoke some conversation over a well-known > hard-to-reason-about problem. > I was inspired with your topic and tried a couple of > never-ask-in-an-interview queries. It would be great if you could add some of your nasty queries to subquery.oq. Test coverage will help us stay on the right path. If you write a query and say “This ought to be done using a plan that uses just a semi-join” then I can either refute your assertion or implement the necessary planner rule. Julian
