alamb commented on a change in pull request #2068:
URL: https://github.com/apache/arrow-datafusion/pull/2068#discussion_r835884359
##########
File path: datafusion-physical-expr/src/physical_expr.rs
##########
@@ -38,4 +43,74 @@ pub trait PhysicalExpr: Send + Sync + Display + Debug {
fn nullable(&self, input_schema: &Schema) -> Result<bool>;
/// Evaluate an expression against a RecordBatch
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;
+ /// Evaluate an expression against a RecordBatch with validity array
+ fn evaluate_selection(
+ &self,
+ batch: &RecordBatch,
+ selection: &BooleanArray,
+ ) -> Result<ColumnarValue> {
+ let mut indices = vec![];
+ for (i, b) in selection.iter().enumerate() {
+ if let Some(true) = b {
+ indices.push(i as u64);
+ }
+ }
+ let indices = UInt64Array::from_iter_values(indices);
Review comment:
I was just thinking it might be possible to do something like the
following psuedo code:
```rust
let mask = and(old_array.null_mask(), selection);
let new_array = old_array.replace_null_mask(mask);
let result = compute_expr(new_array);
```
And skip having to scatter / gather
However, given this code works and is covered by tests maybe we cn revisit
the approach if there is some performance or correctness issue in the future
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]