findepi commented on code in PR #17973:
URL: https://github.com/apache/datafusion/pull/17973#discussion_r2419023180
##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -431,6 +428,15 @@ impl CaseExpr {
_ => Cow::Owned(prep_null_mask_filter(when_value)),
};
+ let true_count = when_value.true_count();
+ if true_count == batch.num_rows() {
+ // Avoid evaluate_selection when all rows are true
+ return self.when_then_expr[0].1.evaluate(batch);
+ } else if true_count == 0 {
+ // Avoid evaluate_selection when all rows are false/null
+ return self.else_expr.as_ref().unwrap().evaluate(batch);
+ }
+
Review Comment:
I agree with the change in `evaluate_selection`
Now that `evaluate_selection` is changed, do we need those lines here?
(They look nice but my only concern is additional code complexity which is
harder to cover with SLT tests. Now that we have branching here, we should have
a bunch of SLT cases that clearly exercise all-true, all-false, some-true
situations.)
##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -97,14 +97,26 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug
+ DynEq + DynHash {
batch: &RecordBatch,
selection: &BooleanArray,
) -> Result<ColumnarValue> {
- let tmp_batch = filter_record_batch(batch, selection)?;
+ let selection_count = selection.true_count();
- let tmp_result = self.evaluate(&tmp_batch)?;
Review Comment:
There is a lot of new code which repeats logic of `filter_record_batch`.
What if we just changed this line only?
```rust
let tmp_result = if tmp_batch.is_empty {
// Do not call `evaluate` when the selection is empty.
// When `evaluate_selection` is being used for conditional, lazy
evaluation,
// evaluating an expression for a false selection vector may end
up unintentionally
// evaluating a fallible expression.
let datatype = self.data_type(batch.schema_ref().as_ref())?;
ColumnarValue::Array(make_builder(&datatype, 0).finish())
} else {
self.evaluate(&tmp_batch)?;
}
```
Note how this does not inspect `selection` / `selection_count` directly,
leveraging the work done by `filter_record_batch`.
##########
datafusion/physical-expr-common/src/physical_expr.rs:
##########
@@ -97,14 +97,26 @@ pub trait PhysicalExpr: Any + Send + Sync + Display + Debug
+ DynEq + DynHash {
batch: &RecordBatch,
selection: &BooleanArray,
) -> Result<ColumnarValue> {
- let tmp_batch = filter_record_batch(batch, selection)?;
+ let selection_count = selection.true_count();
- let tmp_result = self.evaluate(&tmp_batch)?;
+ if batch.num_rows() == 0 || selection_count == batch.num_rows() {
Review Comment:
```suggestion
if selection_count == batch.num_rows() {
```
(i'd assume `batch.num_rows() == 0` implies that also `selection_count ==
0`. If it is not the case, we have a more permissive `if` condition, but
requires a code comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]