yjshen opened a new issue #2064:
URL: https://github.com/apache/arrow-datafusion/issues/2064
`CASE WHEN a > 0 THEN 25.0 / cast(a, float64) ELSE float64(null) END` will
fail with error: `ArrowError(DivideByZero)`, since we are computing `25 div a`
without checking the `when` condition.
To reproduce
```rust
#[test]
fn case_without_expr1() -> Result<()> {
let batch = case_test_batch1()?;
let schema = batch.schema();
// CASE WHEN a > 0 THEN 25.0 / cast(a, float64) ELSE float64(null)
END
let when1 = binary(
col("a", &schema)?,
Operator::Gt,
lit(ScalarValue::Int32(Some(0))),
&batch.schema(),
)?;
let then1 = binary(
lit(ScalarValue::Float64(Some(25.0))),
Operator::Divide,
cast(col("a", &schema)?, &batch.schema(), Float64)?,
&batch.schema(),
)?;
let x = lit(ScalarValue::Float64(None));
let expr = case(None, &[(when1, then1)], Some(x))?;
let result = expr.evaluate(&batch)?.into_array(batch.num_rows());
let result = result
.as_any()
.downcast_ref::<Float64Array>()
.expect("failed to downcast to Int32Array");
let expected = &Float64Array::from(vec![Some(25.0), None, None,
Some(5.0)]);
assert_eq!(expected, result);
Ok(())
}
fn case_test_batch1() -> Result<RecordBatch> {
let schema = Schema::new(vec![Field::new("a", DataType::Int32,
true)]);
let a = Int32Array::from(vec![Some(1), Some(0), None, Some(5)]);
let batch = RecordBatch::try_new(Arc::new(schema),
vec![Arc::new(a)])?;
Ok(batch)
}
```
The
[cause](https://github.com/apache/arrow-datafusion/blob/master/datafusion-physical-expr/src/expressions/case.rs#L344)
of the problem:
```rust
for i in (0..self.when_then_expr.len()).rev() {
let i = i as usize;
let when_value = self.when_then_expr[i].0.evaluate(batch)?;
let when_value = when_value.into_array(batch.num_rows());
let then_value = self.when_then_expr[i].1.evaluate(batch)?;
// <------------
let then_value = then_value.into_array(batch.num_rows());
```
We are not checking the "selection vector" generated by the `when clause` to
evaluate `then clause`.
A possible solution with minimum changes from arrow kernel computations
might be:
```rust
pub trait PhysicalExpr: Send + Sync + Display + Debug {
......
.....
/// Evaluate an expression against a RecordBatch with validity array
fn evaluate_selection(&self, batch: &RecordBatch, selection:
BooleanArray) -> Result<ColumnarValue> {
let valid_batch = take_all_valid(batch, &selection);
let tmp = self.evaluate(&valid_batch)?;
Ok(scatter(tmp, &selection))
}
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]