alamb commented on code in PR #11534: URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683479192
########## datafusion/physical-expr/src/expressions/case.rs: ########## @@ -86,13 +108,35 @@ impl CaseExpr { when_then_expr: Vec<WhenThen>, else_expr: Option<Arc<dyn PhysicalExpr>>, ) -> Result<Self> { + // normalize null literals to None in the else_expr (this already happens + // during SQL planning, but not necessarily for other use cases) + let else_expr = match &else_expr { + Some(e) => match e.as_any().downcast_ref::<Literal>() { + Some(lit) if lit.value().is_null() => None, + _ => else_expr, + }, + _ => else_expr, + }; + if when_then_expr.is_empty() { exec_err!("There must be at least one WHEN clause") } else { + let eval_method = if expr.is_some() { + EvalMethod::WithExpression + } else if when_then_expr.len() == 1 Review Comment: I was thinking about this optimization and I think it would be valid for any CASE expression that had a NULL ELSE, not just column So in other words, I think you could remove the `when_then_expr[0].1.as_any().is::<Column>()` check and this would still work fine ########## datafusion/physical-expr/src/expressions/case.rs: ########## @@ -256,6 +300,36 @@ impl CaseExpr { Ok(ColumnarValue::Array(current_value)) } + + /// This function evaluates the specialized case of: + /// + /// CASE WHEN condition THEN column + /// [ELSE NULL] + /// END + fn case_column_or_null(&self, batch: &RecordBatch) -> Result<ColumnarValue> { + let when_expr = &self.when_then_expr[0].0; + let then_expr = &self.when_then_expr[0].1; + if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? { + let bit_mask = bit_mask + .as_any() + .downcast_ref::<BooleanArray>() + .expect("predicate should evaluate to a boolean array"); + // invert the bitmask + let bit_mask = not(bit_mask)?; + match then_expr.evaluate(batch)? { + ColumnarValue::Array(array) => { + Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?)) + } + ColumnarValue::Scalar(_) => Err(DataFusionError::Execution( Review Comment: this might be better as an internal error You could also use the macros like ```rust internal_err!("expression did not evaluate to an array") ``` ########## datafusion/physical-expr/src/expressions/case.rs: ########## @@ -998,6 +1080,53 @@ mod tests { Ok(()) } + #[test] Review Comment: I think it would be good to make sure we had a `slt` level test to cover this as well, Maybe in https://github.com/apache/datafusion/blob/382bf4f3c7a730828684b9e4ce01369b89717e19/datafusion/sqllogictest/test_files/expr.slt Or we could start adding a file just for CASE if we are about to spend a bunch of time optimizing it 🤔 ########## datafusion/physical-expr/src/expressions/case.rs: ########## @@ -256,6 +300,36 @@ impl CaseExpr { Ok(ColumnarValue::Array(current_value)) } + + /// This function evaluates the specialized case of: + /// + /// CASE WHEN condition THEN column + /// [ELSE NULL] + /// END + fn case_column_or_null(&self, batch: &RecordBatch) -> Result<ColumnarValue> { + let when_expr = &self.when_then_expr[0].0; + let then_expr = &self.when_then_expr[0].1; + if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? { + let bit_mask = bit_mask + .as_any() + .downcast_ref::<BooleanArray>() + .expect("predicate should evaluate to a boolean array"); + // invert the bitmask + let bit_mask = not(bit_mask)?; + match then_expr.evaluate(batch)? { + ColumnarValue::Array(array) => { + Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?)) + } + ColumnarValue::Scalar(_) => Err(DataFusionError::Execution( + "expression did not evaluate to an array".to_string(), + )), + } + } else { + Err(DataFusionError::Execution( + "predicate did not evaluate to an array".to_string(), Review Comment: I agree with @viirya I think it would be fairly straightforward here to handle `ColumnarValue::Scalar` s well. I don't think we need to do it in this PR Given you say in https://github.com/apache/datafusion/pull/11534/files#r1683081792 > I am planning on adding a specialization for scalar values as well to avoid converting scalars to arrays I think we could definitely do it in a follow on PR -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org