Re: [PR] #17972 Restore case expr/expr optimisation while ensuring lazy evaluation [datafusion]

via GitHub Fri, 17 Oct 2025 09:46:55 -0700


findepi commented on code in PR #17973:
URL: https://github.com/apache/datafusion/pull/17973#discussion_r2419023180



##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -431,6 +428,15 @@ impl CaseExpr {
             _ => Cow::Owned(prep_null_mask_filter(when_value)),
         };
 
+        let true_count = when_value.true_count();
+        if true_count == batch.num_rows() {
+            // Avoid evaluate_selection when all rows are true
+            return self.when_then_expr[0].1.evaluate(batch);
+        } else if true_count == 0 {
+            // Avoid evaluate_selection when all rows are false/null
+            return self.else_expr.as_ref().unwrap().evaluate(batch);
+        }
+

Review Comment:
   I agree with the change in `evaluate_selection`
   Now that `evaluate_selection` is changed, do we need those lines here?
   
   (They look nice but my only concern is additional code complexity which is 
harder to cover with SLT tests. Now that we have branching here, we should have 
a bunch of SLT cases that clearly exercise all-true, all-false, some-true 
situations.)



##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -97,14 +97,26 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug 
+ DynEq + DynHash {
         batch: &RecordBatch,
         selection: &BooleanArray,
     ) -> Result<ColumnarValue> {
-        let tmp_batch = filter_record_batch(batch, selection)?;
+        let selection_count = selection.true_count();
 
-        let tmp_result = self.evaluate(&tmp_batch)?;

Review Comment:
   There is a lot of new code which repeats logic of `filter_record_batch`.
   What if we just changed this line only?
   
   ```rust
   let tmp_result =  if tmp_batch.is_empty {
     // Do not call `evaluate` when the selection is empty.
               // When `evaluate_selection` is being used for conditional, lazy 
evaluation,
               // evaluating an expression for a false selection vector may end 
up unintentionally
               // evaluating a fallible expression.
               let datatype = self.data_type(batch.schema_ref().as_ref())?;
               ColumnarValue::Array(make_builder(&datatype, 0).finish())
   } else {
     self.evaluate(&tmp_batch)?;
   }
   ```
   
   Note how this does not inspect `selection` / `selection_count` directly, 
leveraging the work done by `filter_record_batch`.
   
   



##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -97,14 +97,26 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug 
+ DynEq + DynHash {
         batch: &RecordBatch,
         selection: &BooleanArray,
     ) -> Result<ColumnarValue> {
-        let tmp_batch = filter_record_batch(batch, selection)?;
+        let selection_count = selection.true_count();
 
-        let tmp_result = self.evaluate(&tmp_batch)?;
+        if batch.num_rows() == 0 || selection_count == batch.num_rows() {

Review Comment:
   ```suggestion
           if selection_count == batch.num_rows() {
   ```
   
   (i'd assume `batch.num_rows() == 0` implies that also `selection_count == 
0`. If it is not the case, we have a more permissive `if` condition, but 
requires a code comment)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] #17972 Restore case expr/expr optimisation while ensuring lazy evaluation [datafusion]

Reply via email to