pepijnve commented on code in PR #19434:
URL: https://github.com/apache/datafusion/pull/19434#discussion_r2643974414
##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -466,18 +466,12 @@ impl PruningPredicate {
// In particular this unravels any `DynamicFilterPhysicalExpr`s by
snapshotting them
// so that PruningPredicate can work with a static expression.
let tf = snapshot_physical_expr_opt(expr)?;
- if tf.transformed {
- // If we had an expression such as Dynamic(part_col < 5 and col <
10)
- // (this could come from something like `select * from t order by
part_col, col, limit 10`)
- // after snapshotting and because `DynamicFilterPhysicalExpr`
applies child replacements to its
- // children after snapshotting and previously
`replace_columns_with_literals` may have been called with partition values
- // the expression we have now is `8 < 5 and col < 10`.
- // Thus we need as simplifier pass to get `false and col < 10` =>
`false` here.
- let simplifier = PhysicalExprSimplifier::new(&schema);
- expr = simplifier.simplify(tf.data)?;
- } else {
- expr = tf.data;
- }
+ // Always simplify the expression to handle cases like IS NULL/IS NOT
NULL on literals,
+ // constant expressions, and expressions that may have been
transformed during snapshotting
+ // (e.g., DynamicFilterPhysicalExpr with partition value replacements
like `8 < 5` => `false`).
+ // The simplifier is efficient and returns early if nothing can be
simplified.
+ let simplifier = PhysicalExprSimplifier::new(&schema);
+ expr = simplifier.simplify(tf.data)?;
Review Comment:
I was playing around with a simple microbenchmark that measured
simplification of `(c1 < 100 or c1 > 200) and (c2 < 100 or c2 > 200)`. A lot of
time was being spent trying to simplify the literals, which is a bit wasteful.
This little addition resulted in about `-86%` on the microbenchmark. Overhead
is then about `2.39 µs` on my laptop.
@adriangb do you have some example of other typical 'complex' expression
that would be useful to benchmark?
```
--- a/datafusion/physical-expr/src/simplifier/mod.rs (revision
2e160e07e993deee2ff6f93bce6d7dd5f4b7a198)
+++ b/datafusion/physical-expr/src/simplifier/mod.rs (date 1766507550524)
@@ -54,6 +54,10 @@
while count < MAX_LOOP_COUNT {
count += 1;
let result = current_expr.transform(|node| {
+ if node.as_any().downcast_ref::<Literal>().is_some() {
+ return Ok(Transformed::no(node));
+ }
+
#[cfg(test)]
let original_type = node.data_type(schema).unwrap();
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]