xinghuayu007 opened a new issue #5505:
URL: https://github.com/apache/incubator-doris/issues/5505
### **1. Background**
When a SQL query is like `select * from table1 where a = 1 and a = 2`, it
contains two predicates: `a = 1`, `a = 2`. Obviously, the two predicates is
conflict. This problem is defined as **Predicate Conflict Problem**. When a
SQL query is like `select * from table1 where a > 1 and a > 2`. Predicate `a >
1` and `a > 2` can be merged into one predicate `a > 2`. This problem is
defined as **Predicate Merge Problem**.
What situation these two problems will happen in?
a. BI tool generate sql
b. complicate sql with predicate pushdown
### **2. Design**
**a. Predicate Conflict Problem**
A WhereClause may be very complicated with many nested layers, such as ((D
or ((A and B) or C))) and E). In order to compute conviently, we can firstly
transform the expression into **Primary Disjunctive Paradigm(主析取范式)**. Primary
Disjunctive Paradigm is like (A and B and C) or (D and E and F) or (G and H and
I). Item `(A and B and C)` is called
**Minimal Item(极小项)**. Every Minimal Item is consist of `AND Expression`.
Therefore if we can check every item in Minimal Item wether is conflict with
each other. If a Minimal Item has conflicts, it is false. if all Minimal Items
are false, the whole expression is false.
**b. Predicate Merge Problem**
If two predicates can be merged, it means they share the same column, and
the predicate is like `column > constant` compounded with one column and one
constant. There no every good methods to solve this problem. Presto traverses
every sub-expression from bottom to top and merge every expression connected
with `and` or `or`. If two expression can not be merged, suck as `a > 1 or b >
2`, it returns `{all}` to represet a no limit boundary.
We can compute the expression based on the result of Predicate Merge
Problem. First compute every Minimal Item, because it is consist of `and`,
which is easy to handle. Second compute two Minimal Items. It is easy to
compute `and` firstly then `or` than compute `or` then `and`.
### **3. Implement**

**Function Define**
Expr optimizePredicate(Expr whereClause);
Input Param: whereClause
Output Param: return an optimized expression, if it is conflict, return a
False BooleanLiteral expression
**a. Generate Primary Disjunctive Paradigm**
To transform a normal expression into Primary Disjunctive Paradigm, we use
`**True Table(真值表)**` algorithm.
**b. Check Predicate Conflict**
Then we can check every Minimal Item, if one Minimal Item is false, remove
it. If all Minimal Items are false, return a False BooleanLiteral expression.
**c. Merge Predicate**
First compute every Minimal Item, because it is consist of `and`, which is
easy to handle. Second compute two Minimal Items. Last return an optimized
expression.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]