alamb commented on code in PR #11534:
URL: https://github.com/apache/datafusion/pull/11534#discussion_r1683479192


##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -86,13 +108,35 @@ impl CaseExpr {
         when_then_expr: Vec<WhenThen>,
         else_expr: Option<Arc<dyn PhysicalExpr>>,
     ) -> Result<Self> {
+        // normalize null literals to None in the else_expr (this already 
happens
+        // during SQL planning, but not necessarily for other use cases)
+        let else_expr = match &else_expr {
+            Some(e) => match e.as_any().downcast_ref::<Literal>() {
+                Some(lit) if lit.value().is_null() => None,
+                _ => else_expr,
+            },
+            _ => else_expr,
+        };
+
         if when_then_expr.is_empty() {
             exec_err!("There must be at least one WHEN clause")
         } else {
+            let eval_method = if expr.is_some() {
+                EvalMethod::WithExpression
+            } else if when_then_expr.len() == 1

Review Comment:
   I was thinking about this optimization and I think it would be valid for any 
CASE expression that had a NULL ELSE, not just column
   
   So in other words, I think you could remove the 
`when_then_expr[0].1.as_any().is::<Column>()` check and this would still work 
fine



##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -256,6 +300,36 @@ impl CaseExpr {
 
         Ok(ColumnarValue::Array(current_value))
     }
+
+    /// This function evaluates the specialized case of:
+    ///
+    /// CASE WHEN condition THEN column
+    ///      [ELSE NULL]
+    /// END
+    fn case_column_or_null(&self, batch: &RecordBatch) -> 
Result<ColumnarValue> {
+        let when_expr = &self.when_then_expr[0].0;
+        let then_expr = &self.when_then_expr[0].1;
+        if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? {
+            let bit_mask = bit_mask
+                .as_any()
+                .downcast_ref::<BooleanArray>()
+                .expect("predicate should evaluate to a boolean array");
+            // invert the bitmask
+            let bit_mask = not(bit_mask)?;
+            match then_expr.evaluate(batch)? {
+                ColumnarValue::Array(array) => {
+                    Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?))
+                }
+                ColumnarValue::Scalar(_) => Err(DataFusionError::Execution(

Review Comment:
   this might be better as an internal error
   
   You could also use the macros like
   
   ```rust
   internal_err!("expression did not evaluate to an array")
   ```



##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -998,6 +1080,53 @@ mod tests {
         Ok(())
     }
 
+    #[test]

Review Comment:
   I think it would be good to make sure we had a `slt` level test to cover 
this as well,
   
   Maybe in 
   
https://github.com/apache/datafusion/blob/382bf4f3c7a730828684b9e4ce01369b89717e19/datafusion/sqllogictest/test_files/expr.slt
   
   Or we could start adding a file just for CASE if we are about to spend a 
bunch of time optimizing it 🤔 



##########
datafusion/physical-expr/src/expressions/case.rs:
##########
@@ -256,6 +300,36 @@ impl CaseExpr {
 
         Ok(ColumnarValue::Array(current_value))
     }
+
+    /// This function evaluates the specialized case of:
+    ///
+    /// CASE WHEN condition THEN column
+    ///      [ELSE NULL]
+    /// END
+    fn case_column_or_null(&self, batch: &RecordBatch) -> 
Result<ColumnarValue> {
+        let when_expr = &self.when_then_expr[0].0;
+        let then_expr = &self.when_then_expr[0].1;
+        if let ColumnarValue::Array(bit_mask) = when_expr.evaluate(batch)? {
+            let bit_mask = bit_mask
+                .as_any()
+                .downcast_ref::<BooleanArray>()
+                .expect("predicate should evaluate to a boolean array");
+            // invert the bitmask
+            let bit_mask = not(bit_mask)?;
+            match then_expr.evaluate(batch)? {
+                ColumnarValue::Array(array) => {
+                    Ok(ColumnarValue::Array(nullif(&array, &bit_mask)?))
+                }
+                ColumnarValue::Scalar(_) => Err(DataFusionError::Execution(
+                    "expression did not evaluate to an array".to_string(),
+                )),
+            }
+        } else {
+            Err(DataFusionError::Execution(
+                "predicate did not evaluate to an array".to_string(),

Review Comment:
   I agree with @viirya  I think it would be fairly straightforward here to 
handle `ColumnarValue::Scalar` s well. I don't think we need to do it in this PR
   
   Given you say  in 
https://github.com/apache/datafusion/pull/11534/files#r1683081792 
   
   > I am planning on adding a specialization for scalar values as well to 
avoid converting scalars to arrays
   
   
   I think we could definitely do it in a follow on PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to