For T(p) and F(p), did you know that SQL has IS TRUE and IS FALSE? I find them very useful. For example, if I write “select * from t where x > 5” it is implicitly “select * from t where (x > 5) is true”. That takes care of 3-valued logic, because if “x" is null then “x > 5” is unknown and “(x > 5) is true” is false.
Regarding deriving additional filters. I am cautious about adding too many derived predicates to RelMdPredicates. The number of derived predicates is exponential in the size of the original predicate, so we are placing a burden on the downstream RelNodes that will need to propagate those predicates. > On Apr 27, 2018, at 4:30 AM, Zhong Yu <[email protected]> wrote: > > In org.apache.calcite.rel.rules.FilterJoinRule, > there's the following comment > > // TODO - add logic to derive additional filters. E.g., from > // (t1.a = 1 AND t2.a = 2) OR (t1.b = 3 AND t2.b = 4), you can > // derive table filters: > // (t1.a = 1 OR t1.b = 3) > // (t2.a = 2 OR t2.b = 4) > > When perf-testing TPC-H/DS queries, we found that such additional filters > would have been very profitable in several cases. > > Has anybody worked on this issue or is there already a util to that effect? > > The following algorithm is what I am thinking; please comment if interested. > > Given a predicate p, we want to derive an implied predicate q, i.e. p->q, > and q only references columns on a specified side of a join. Then we can > transform p to AND(p, q); q will be pushed down at a later stage. > > Since we are dealing with 3-value booleans, we need to be very careful. > To be precise, we need a function T(p) which returns a predicate such that > for any row, if p evaluates to TRUE, T(p) must also evaluate to TRUE. > To handle NOT, we also need a function F(p) such that for any row, > if p evaluates to FALSE, F(p) must also evaluate to FALSE. > > T(p) can be defined as such -- > > T( p ) => p, if p only references a specified side of a join > T( OR(a,b) ) => OR(T(a), T(b)) > T( AND(a,b) ) => AND(T(a), T(b)) > T( NOT(p) ) => NOT( F(p) ) > T( p ) => TRUE, for all other cases, as the fallback > > // not a present concern, but more rules can be plugged into T(p) if > // desired in some applications, e.g. T( x*x+y*y<100 ) => x*x<100 > > F(p) is defined similarly -- > > F( p ) => p, if p only references a specified side of a join > F( OR(a,b) ) => OR(F(a), F(b)) > F( AND(a,b) ) => AND(F(a), F(b)) > F( NOT(p) ) => NOT( T(p) ) > F( p ) => FALSE, for all other cases, as the fallback > > We can unify T(p) and F(p) as one function X(v,p), where v is TRUE/FALSE. > For any row, if p evaluates to v, then X(v,p) also evaluates to v. > > (The case of UNKNOWN seems very difficult, I'm not considering it here.)
