adriangb commented on code in PR #19434:
URL: https://github.com/apache/datafusion/pull/19434#discussion_r2640285780
##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -466,18 +466,12 @@ impl PruningPredicate {
// In particular this unravels any `DynamicFilterPhysicalExpr`s by
snapshotting them
// so that PruningPredicate can work with a static expression.
let tf = snapshot_physical_expr_opt(expr)?;
- if tf.transformed {
- // If we had an expression such as Dynamic(part_col < 5 and col <
10)
- // (this could come from something like `select * from t order by
part_col, col, limit 10`)
- // after snapshotting and because `DynamicFilterPhysicalExpr`
applies child replacements to its
- // children after snapshotting and previously
`replace_columns_with_literals` may have been called with partition values
- // the expression we have now is `8 < 5 and col < 10`.
- // Thus we need as simplifier pass to get `false and col < 10` =>
`false` here.
- let simplifier = PhysicalExprSimplifier::new(&schema);
- expr = simplifier.simplify(tf.data)?;
- } else {
- expr = tf.data;
- }
+ // Always simplify the expression to handle cases like IS NULL/IS NOT
NULL on literals,
+ // constant expressions, and expressions that may have been
transformed during snapshotting
+ // (e.g., DynamicFilterPhysicalExpr with partition value replacements
like `8 < 5` => `false`).
+ // The simplifier is efficient and returns early if nothing can be
simplified.
+ let simplifier = PhysicalExprSimplifier::new(&schema);
+ expr = simplifier.simplify(tf.data)?;
Review Comment:
This adds overhead possibly for cases where it's not needed.
The struggle we face is that for cases where we don't transform the physical
expression, i.e. it is exactly what's created from a logical expression, there
is no point in simplifying, we have simplification at the logical expression
layer.
But on the other hand forcing callers to call simplify for best results
is... an annoying API.
Maybe if we benchmarked the simplifier to be very fast on complex
expressions that don't require simplification we could justify always calling
it?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]