I am trying to deduce NOT NULL constraints. Consider the query
SELECT mgr
FROM Emp
WHERE empno > 10 AND mgr > empno
which turns into
Project(mgr)
Filter(empno > 10 AND mgr > empno)
Scan(Emp)
The mgr column is declared as a nullable int, but it is clear (to you and me)
that in this query, mgr is never null. But Calcite cannot currently deduce it.
I want to use RexImplicationChecker to deduce things like this, since it is
basically a theorem prover. To figure out whether mgr is not nullable, we form
the predicate “mgr IS NOT NULL”, take the known predicates (“empno > 10”, “mgr
> 10”), and ask the RexImplicationChecker whether the known predicates imply
the predicate we want.
The problem is, the predicates on the Filter are [“empno > 10, mgr > empno”],
but the predicates on the Project are empty. Why? Because you can’t propagate
predicates to a Project if they reference non-projected columns.
Now, there are some useful predicates we could deduce for the Project, such as
“mgr > 10” or “mgr IS NOT NULL”, and some less useful ones like “mgr > -1000".
But should we be deriving such predicates? We could potentially spend a lot of
time and memory computing predicates that are never used.
How far should we go in speculatively creating predicates? Are there any other
ways to solve the fact that Project has fewer predicates than it should?
Julian