I am trying to deduce NOT NULL constraints. Consider the query

  SELECT mgr
  FROM Emp
  WHERE empno > 10 AND mgr > empno

which turns into

  Project(mgr)
    Filter(empno > 10 AND mgr > empno)
      Scan(Emp)

The mgr column is declared as a nullable int, but it is clear (to you and me) 
that in this query, mgr is never null. But Calcite cannot currently deduce it.

I want to use RexImplicationChecker to deduce things like this, since it is 
basically a theorem prover. To figure out whether mgr is not nullable, we form 
the predicate “mgr IS NOT NULL”, take the known predicates (“empno > 10”, “mgr 
> 10”), and ask the RexImplicationChecker whether the known predicates imply 
the predicate we want.

The problem is, the predicates on the Filter are [“empno > 10, mgr > empno”], 
but the predicates on the Project are empty. Why? Because you can’t propagate 
predicates to a Project if they reference non-projected columns.

Now, there are some useful predicates we could deduce for the Project, such as 
“mgr > 10” or “mgr IS NOT NULL”, and some less useful ones like “mgr > -1000". 
But should we be deriving such predicates? We could potentially spend a lot of 
time and memory computing predicates that are never used.

How far should we go in speculatively creating predicates? Are there any other 
ways to solve the fact that Project has fewer predicates than it should?

Julian

Reply via email to