Currently, Expressions (used to specify dataset filters and projections) are simplified by direct rewriting: a filter such as `alpha == 2 and beta > 3` on a partition where we are guaranteed that `beta == 5` will be rewritten to `alpha == 2` before evaluation against scanned batches. This can potentially occur for each scanned batch: for example, Parquet's row group statistics are used in the same way to simplify filters.
Rewriting is not extremely expensive (a microbenchmark estimate on my machine shows that a simple case such as the above takes 4ms). However it does complicate an execution of a logical plan wherein expressions being evaluated are not identical to the expression with which the plan was constructed. It seems it might be preferable to avoid mutating expressions and instead build a mapping from sub expressions to known values which can be used by subsequent simplification passes and during execution. I'd have to benchmark it, but it also seems like we might speed up expression simplification this way. Any thoughts?