adriangb commented on code in PR #19434:
URL: https://github.com/apache/datafusion/pull/19434#discussion_r2640285780


##########
datafusion/pruning/src/pruning_predicate.rs:
##########
@@ -466,18 +466,12 @@ impl PruningPredicate {
         // In particular this unravels any `DynamicFilterPhysicalExpr`s by 
snapshotting them
         // so that PruningPredicate can work with a static expression.
         let tf = snapshot_physical_expr_opt(expr)?;
-        if tf.transformed {
-            // If we had an expression such as Dynamic(part_col < 5 and col < 
10)
-            // (this could come from something like `select * from t order by 
part_col, col, limit 10`)
-            // after snapshotting and because `DynamicFilterPhysicalExpr` 
applies child replacements to its
-            // children after snapshotting and previously 
`replace_columns_with_literals` may have been called with partition values
-            // the expression we have now is `8 < 5 and col < 10`.
-            // Thus we need as simplifier pass to get `false and col < 10` => 
`false` here.
-            let simplifier = PhysicalExprSimplifier::new(&schema);
-            expr = simplifier.simplify(tf.data)?;
-        } else {
-            expr = tf.data;
-        }
+        // Always simplify the expression to handle cases like IS NULL/IS NOT 
NULL on literals,
+        // constant expressions, and expressions that may have been 
transformed during snapshotting
+        // (e.g., DynamicFilterPhysicalExpr with partition value replacements 
like `8 < 5` => `false`).
+        // The simplifier is efficient and returns early if nothing can be 
simplified.
+        let simplifier = PhysicalExprSimplifier::new(&schema);
+        expr = simplifier.simplify(tf.data)?;

Review Comment:
   This adds overhead possibly for cases where it's not needed.
   
   The struggle we face is that for cases where we don't transform the physical 
expression, i.e. it is exactly what's created from a logical expression, there 
is no point in simplifying, we have simplification at the logical expression 
layer.
   
   But on the other hand forcing callers to call simplify for best results 
is... an annoying API.
   
   Maybe if we benchmarked the simplifier to be very fast on complex 
expressions that don't require simplification we could justify always calling 
it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to