Currently, Expressions (used to specify dataset filters and projections)
are simplified by direct rewriting: a filter such as `alpha == 2 and beta >
3`
on a partition where we are guaranteed that `beta == 5` will be rewritten
to `alpha == 2` before evaluation against scanned batches. This can
potentially occur for each scanned batch: for example, Parquet's row group
statistics are used in the same way to simplify filters.

Rewriting is not extremely expensive (a microbenchmark estimate on
my machine shows that a simple case such as the above takes 4ms).
However it does complicate an execution of a logical plan wherein
expressions being evaluated are not identical to the expression with
which the plan was constructed.

It seems it might be preferable to avoid mutating expressions and instead
build a mapping from sub expressions to known values which can be used
by subsequent simplification passes and during execution. I'd have to
benchmark it, but it also seems like we might speed up expression
simplification this way. Any thoughts?

Reply via email to