pepijnve commented on code in PR #17973:
URL: https://github.com/apache/datafusion/pull/17973#discussion_r2417969832


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -431,6 +428,15 @@ impl CaseExpr {
             _ => Cow::Owned(prep_null_mask_filter(when_value)),
         };
 
+        let true_count = when_value.true_count();
+        if true_count == batch.num_rows() {
+            // Avoid evaluate_selection when all rows are true
+            return self.when_then_expr[0].1.evaluate(batch);
+        } else if true_count == 0 {
+            // Avoid evaluate_selection when all rows are false/null
+            return self.else_expr.as_ref().unwrap().evaluate(batch);
+        }
+

Review Comment:
   > isn't this reintroducing the bug that was fixed in 
https://github.com/apache/datafusion/issues/15384, just in a big more complex 
wrapping?
   
   No, why do you think that's the case? If you write `case foo is not null 
then foo else 1/0 end` and `foo` happens to be `NULL`, what do you expect to 
happen?
   
   The original issue was that in the example above for a single row with a 
non-null `foo`, the code was evaluating the then branch with `[true]` as 
selection vector and the else branch with `[false]`. The latter was passed to 
`evaluate_selection` which then filters the record batch down to an empty 
record batch and then calls the else expression with that record batch. For an 
expression like `1/0`, you end up getting executing that division anyway even 
though the result would be discarded.
   
   There are two ways to fix this:
   1. don't evaluate binary expressions and literals for empty batches (as you 
had suggested in a comment earlier I believe)
   2. don't call evaluate with empty input batches
   
   The earlier fix had the effect of 2. as well, just in a less explicit way. 
The fix here does the same but adds the necessary checks in the explicit 
expr/expr code path. The code that's being seen as an optimisation is intended 
to prevent calling `evaluate_selection` with an all-false selection vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to