Andrew Lamb created ARROW-9770:
----------------------------------

             Summary: [Rust] [DataFusion] Add constant folding to expressions 
during logically planning
                 Key: ARROW-9770
                 URL: https://issues.apache.org/jira/browse/ARROW-9770
             Project: Apache Arrow
          Issue Type: Improvement
            Reporter: Andrew Lamb


The high level idea is that if an expression can be partially evaluated during 
planning time then
# The execution time will be increased
# There may be additional optimizations possible (like removing entire 
LogicalPlan nodes, for example)

I recently saw the following selection expression created (by the [predicate 
push down|https://github.com/apache/arrow/pull/7880])

{code}
Selection: #a Eq Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1) And #a Eq 
Int64(1) And #b GtEq Int64(1) And #a LtEq Int64(1)
              TableScan: test projection=None
{code}

This could be simplified significantly:
1. Duplicate clauses could be removed (e.g. `#a Eq Int64(1) And #a Eq Int64(1)` 
--> `#a Eq Int64(1)`)
2. Algebraic simplification (e.g. if `A<=B and A=5`, is the same as `A=5`)

Inspiration can be taken from the postgres code that evaluates constant 
expressions 
https://doxygen.postgresql.org/clauses_8c.html#ac91c4055a7eb3aa6f1bc104479464b28

(in this case, for example if you have a predicate A=5 then you can basically 
substitute in A=5 for any expression higher up in the the plan).

Other classic optimizations include things such as `A OR TRUE` --> `A`, `A AND 
TRUE` --> TRUE,  etc.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to