mwylde opened a new issue, #16928: URL: https://github.com/apache/datafusion/issues/16928
### Describe the bug When evaluating an AND expression, if fewer than 20% of the LHS values are true, the evaluator will use a "pre selection" strategy wherein we evaluate the right hand only for those rows where the left hand is true. However, there is a bug in how scalars are handled for the right hand: https://github.com/apache/datafusion/blame/9deec2ad725813e266dc8c51ec75302fed7412f8/datafusion/physical-expr/src/expressions/binary.rs#L979-L982 For any scalar RHS we just return the scalar, which is not generally correct. For example, in the expession (`x = 5 AND true`) evaluated against [3, 5, 10, 12, 1] this will return [true, true, true, true], whereas the correct result is `[false, true, false, false, false]`. ### To Reproduce This is somewhat hard to reproduce as expressions that trigger the issue will generally be optimized away, however it can be triggered by unoptimized expressions or expressions that are manually constructed. This test exhibits the behavior ```rust #[test] fn test_and_true_preselection_returns_lhs() { let schema = Arc::new(Schema::new(vec![Field::new("c", DataType::Boolean, false)])); let c_array = Arc::new(BooleanArray::from(vec![false, true, false, false, false])) as ArrayRef; let batch = RecordBatch::try_new(Arc::clone(&schema), vec![Arc::clone(&c_array)]) .unwrap(); let expr = logical2physical(&logical_col("c").and(expr_lit(true)), &schema); let lhs_value = logical2physical(&logical_col("c"), &schema) .evaluate(&batch) .unwrap(); assert!(matches!( check_short_circuit(&lhs_value, &Operator::And), ShortCircuitStrategy::PreSelection(_) )); let result = expr.evaluate(&batch).unwrap(); let ColumnarValue::Array(result_arr) = result else { panic!("Expected ColumnarValue::Array"); }; let expected: Vec<_> = c_array.as_boolean().iter().collect(); let actual: Vec<_> = result_arr.as_boolean().iter().collect(); assert_eq!( expected, actual, "AND with TRUE must equal LHS even with PreSelection" ); } ``` ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org