alamb opened a new issue #1070:
URL: https://github.com/apache/arrow-datafusion/issues/1070


   
   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   A classic part of query optimization is algebraic transformations such as 
partially evaluating expressions once at plan time rather than over and over 
for each row during execution time.
   
   For example, a predicate such as `where time < 
date_trunc('2021-10-04Z10:12:13', 'year')` can be rewritten to `where time < 
'2021-01-01Z00:00:00'` which both saves many redundant evaluations of the 
`date_trunc` functions and *also* unlocks additional optimizations such as 
parquet row group pruning and using constant comparison kernels.
   
   DataFusion has a basic constant folding implementation here: 
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/constant_folding.rs
   
   However, as implemented, it has a few drawbacks:
   1. It only covers a few algebraic transformations (like boolean algebra and 
`now()` expansion)
   2. It effectively is a second implementation of expression evaluation so
   2a. As new expression support is added, it would also need to be added to 
constant_folding.rs
   2b. It runs the risk of producing different answers than if the expression 
had been evaluated at runtime
   3. It does not handle functions and sorting out how to support it for user 
defined functions would be non trivial
   
   **Describe the solution you'd like**
   
   Reuse the existing expression evaluation framework (namely 
`PhysicalExpr::evaluate` and everything in `physical_plan/expressions`) to 
implement constant folding.
   
   This would be beneficial because:
   1. All current and future expression types could be evaluated (including 
user defined functions)
   2. It would allow more sophisticated expression transformations / 
optimziations such as https://github.com/apache/arrow-datafusion/pull/1066
   
   The high level idea would be to walk the `Expr` tree bottom up, and if a 
subtree contained only constants (and non volitalie functions 
https://github.com/apache/arrow-datafusion/issues/1069) create and run a 
`PhysicalExpr` to produce a single value, and then replace the subtree with 
that appropriate constant.
   
   
   **Describe alternatives you've considered**
   I think it is possible to implement expression evaluation as a set of 
rewrite rules (as is partially done in #1066) but that still has the downside 
that the
   
   **Additional context**
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to