mwylde opened a new issue, #16928:
URL: https://github.com/apache/datafusion/issues/16928

   ### Describe the bug
   
   When evaluating an AND expression, if fewer than 20% of the LHS values are 
true, the evaluator will use a "pre selection" strategy wherein we evaluate the 
right hand only for those rows where the left hand is true.
   
   However, there is a bug in how scalars are handled for the right hand: 
   
   
https://github.com/apache/datafusion/blame/9deec2ad725813e266dc8c51ec75302fed7412f8/datafusion/physical-expr/src/expressions/binary.rs#L979-L982
   
   For any scalar RHS we just return the scalar, which is not generally correct.
   
   For example, in the expession (`x = 5 AND true`) evaluated against [3, 5, 
10, 12, 1] this will return [true, true, true, true], whereas the correct 
result is `[false, true, false, false, false]`.
   
   
   
   ### To Reproduce
   
   This is somewhat hard to reproduce as expressions that trigger the issue 
will generally be optimized away, however it can be triggered by unoptimized 
expressions or expressions that are manually constructed.
   
   This test exhibits the behavior
   
   ```rust
       #[test]
       fn test_and_true_preselection_returns_lhs() {
           let schema =
               Arc::new(Schema::new(vec![Field::new("c", DataType::Boolean, 
false)]));
           let c_array = Arc::new(BooleanArray::from(vec![false, true, false, 
false, false]))
               as ArrayRef;
           let batch = RecordBatch::try_new(Arc::clone(&schema), 
vec![Arc::clone(&c_array)])
               .unwrap();
   
           let expr = logical2physical(&logical_col("c").and(expr_lit(true)), 
&schema);
   
           let lhs_value = logical2physical(&logical_col("c"), &schema)
               .evaluate(&batch)
               .unwrap();
           assert!(matches!(
               check_short_circuit(&lhs_value, &Operator::And),
               ShortCircuitStrategy::PreSelection(_)
           ));
   
           let result = expr.evaluate(&batch).unwrap();
           let ColumnarValue::Array(result_arr) = result else {
               panic!("Expected ColumnarValue::Array");
           };
   
           let expected: Vec<_> = c_array.as_boolean().iter().collect();
           let actual: Vec<_> = result_arr.as_boolean().iter().collect();
           assert_eq!(
               expected, actual,
               "AND with TRUE must equal LHS even with PreSelection"
           );
       }
   ```
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to