pepijnve commented on code in PR #17973:
URL: https://github.com/apache/datafusion/pull/17973#discussion_r2417969832
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -431,6 +428,15 @@ impl CaseExpr {
_ => Cow::Owned(prep_null_mask_filter(when_value)),
};
+ let true_count = when_value.true_count();
+ if true_count == batch.num_rows() {
+ // Avoid evaluate_selection when all rows are true
+ return self.when_then_expr[0].1.evaluate(batch);
+ } else if true_count == 0 {
+ // Avoid evaluate_selection when all rows are false/null
+ return self.else_expr.as_ref().unwrap().evaluate(batch);
+ }
+
Review Comment:
> isn't this reintroducing the bug that was fixed in
https://github.com/apache/datafusion/issues/15384, just in a big more complex
wrapping?
No, why do you think that's the case? If you write `case foo is not null
then foo else 1/0 end` and `foo` happens to be `NULL`, what do you expect to
happen?
The original issue was that in the example above for a single row with a
non-null `foo`, the code was evaluating the then branch with `[true]` as
selection vector and the else branch with `[false]`. The latter was passed to
`evaluate_selection` which then filters the record batch down to an empty
record batch and then calls the else expression with that record batch. For an
expression like `1/0`, you end up getting executing that division anyway even
though the result would be discarded.
There are two ways to fix this:
1. don't evaluate binary expressions and literals for empty batches (as you
had suggested in a comment earlier I believe)
2. don't call evaluate with empty input batches
The earlier fix had the effect of 2. as well, just in a less explicit way.
The fix here does the same but adds the necessary checks in the explicit
expr/expr code path. The code that's being seen as an optimisation is intended
to prevent calling `evaluate_selection` with an all-false selection vector.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]