acking-you commented on PR #15694: URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2800087698
The relevant bug fixes have been completed, and corresponding performance tests have been conducted. The results show that pre-selection has achieved significant gains! @Dandandan @alamb Compare the current optimization with the main branch using `cargo bench --bench binary_op`. The results are as follows, where fluctuations within ±5% are considered as no changes. ## Performance Comparison of the AND Logic Group | Test Case | main | short-and-optimize | Ratio | Change | | --------------------- | --------- | ------------------ | ----------- | ---------- | | all_false | 62.623 ns | 65.923 ns | 0.95x | no changes | | one_true_first | 448.69 µs | 195.60 µs | **2.29x** | ↑ ✅ | | one_true_last | 452.00 µs | 171.91 µs | **2.63x** | ↑ ✅ | | one_true_middle | 453.12 µs | 173.94 µs | **2.60x** | ↑ ✅ | | one_true_middle_left | 453.06 µs | 165.70 µs | **2.73x** | ↑ ✅ | | one_true_middle_right | 459.61 µs | 171.53 µs | **2.68x** | ↑ ✅ | | all_true_in_and | 450.03 µs | 445.76 µs | 1.01x | no changes | ## Performance Comparison of the OR Logic Group | Test Case | main | short-and-optimize | Ratio | Change | | ---------------------- | --------- | ------------------ | ----- | ---------- | | all_true | 61.162 ns | 64.430 ns | 0.95x | no changes | | one_false_first | 448.51 µs | 439.92 µs | 1.02x | no changes | | one_false_last | 447.38 µs | 453.64 µs | 0.99x | no changes | | one_false_middle | 457.79 µs | 447.15 µs | 1.02x | no changes | | one_false_middle_left | 452.78 µs | 447.75 µs | 1.01x | no changes | | one_false_middle_right | 451.21 µs | 444.23 µs | 1.02x | no changes | | all_false_in_or | 449.90 µs | 442.36 µs | 1.02x | no changes | ## Possible next step(extend to nulls) ### Short-circuit optimization cannot be extended to nulls The current short-circuit optimization is only applicable to cases without null values. However, based on the calculation principles of "and" and "or", if the left-hand side (lhs) evaluates to null, then the final result can only be determined by continuing to calculate the right-hand side (rhs). Therefore, optimization for this scenario is not feasible. Below is an example of a calculation where lhs is null: ```sql ❯ select null and true; +------------------------+ | NULL AND Boolean(true) | +------------------------+ | | +------------------------+ 1 row in set. Query took 0.000 seconds. ❯ select null and false; +-------------------------+ | NULL AND Boolean(false) | +-------------------------+ | false | +-------------------------+ 1 row in set. Query took 0.000 seconds. ❯ select null or false; +------------------------+ | NULL OR Boolean(false) | +------------------------+ | | +------------------------+ 1 row in set. Query took 0.000 seconds. ❯ select null or true; +-----------------------+ | NULL OR Boolean(true) | +-----------------------+ | true | +-----------------------+ 1 row in set. Query took 0.000 seconds. ``` ### Pre-selection can be extended to include nulls As I explained earlier: https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340, pre-selection can actually be extended to cover cases involving null values. However, one point needs to be confirmed: [filter_record_batch](https://docs.rs/arrow-select/54.2.1/src/arrow_select/filter.rs.html#202-205) will retain rows that are null. > 4. Combine the left-hand and right-hand boolean arrays to produce the correct boolean array (modify the positions in the left-hand array marked as true based on the values from the right-hand array). Afterward, we only need to modify the fourth step of the pre-selection process mentioned earlier to complete the extension that supports nulls. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org