alamb opened a new issue #1070:
URL: https://github.com/apache/arrow-datafusion/issues/1070
**Is your feature request related to a problem or challenge? Please describe
what you are trying to do.**
A classic part of query optimization is algebraic transformations such as
partially evaluating expressions once at plan time rather than over and over
for each row during execution time.
For example, a predicate such as `where time <
date_trunc('2021-10-04Z10:12:13', 'year')` can be rewritten to `where time <
'2021-01-01Z00:00:00'` which both saves many redundant evaluations of the
`date_trunc` functions and *also* unlocks additional optimizations such as
parquet row group pruning and using constant comparison kernels.
DataFusion has a basic constant folding implementation here:
https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/optimizer/constant_folding.rs
However, as implemented, it has a few drawbacks:
1. It only covers a few algebraic transformations (like boolean algebra and
`now()` expansion)
2. It effectively is a second implementation of expression evaluation so
2a. As new expression support is added, it would also need to be added to
constant_folding.rs
2b. It runs the risk of producing different answers than if the expression
had been evaluated at runtime
3. It does not handle functions and sorting out how to support it for user
defined functions would be non trivial
**Describe the solution you'd like**
Reuse the existing expression evaluation framework (namely
`PhysicalExpr::evaluate` and everything in `physical_plan/expressions`) to
implement constant folding.
This would be beneficial because:
1. All current and future expression types could be evaluated (including
user defined functions)
2. It would allow more sophisticated expression transformations /
optimziations such as https://github.com/apache/arrow-datafusion/pull/1066
The high level idea would be to walk the `Expr` tree bottom up, and if a
subtree contained only constants (and non volitalie functions
https://github.com/apache/arrow-datafusion/issues/1069) create and run a
`PhysicalExpr` to produce a single value, and then replace the subtree with
that appropriate constant.
**Describe alternatives you've considered**
I think it is possible to implement expression evaluation as a set of
rewrite rules (as is partially done in #1066) but that still has the downside
that the
**Additional context**
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]