acking-you commented on PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2800087698

   The relevant bug fixes have been completed, and corresponding performance 
tests have been conducted. The results show that pre-selection has achieved 
significant gains! @Dandandan  @alamb 
   
   Compare the current optimization with the main branch using `cargo bench 
--bench binary_op`. The results are as follows, where fluctuations within ±5% 
are considered as no changes.
   
   ## Performance Comparison of the AND Logic Group
   
   | Test Case             | main      | short-and-optimize | Ratio       | 
Change     |
   | --------------------- | --------- | ------------------ | ----------- | 
---------- |
   | all_false             | 62.623 ns | 65.923 ns          | 0.95x       | no 
changes |
   | one_true_first        | 448.69 µs | 195.60 µs          | ​**2.29x**​ | ↑ ✅ 
       |
   | one_true_last         | 452.00 µs | 171.91 µs          | ​**2.63x**​ | ↑ ✅ 
       |
   | one_true_middle       | 453.12 µs | 173.94 µs          | ​**2.60x**​ | ↑ ✅ 
       |
   | one_true_middle_left  | 453.06 µs | 165.70 µs          | ​**2.73x**​ | ↑ ✅ 
       |
   | one_true_middle_right | 459.61 µs | 171.53 µs          | ​**2.68x**​ | ↑ ✅ 
       |
   | all_true_in_and       | 450.03 µs | 445.76 µs          | 1.01x       | no 
changes |
   
   ## Performance Comparison of the OR Logic Group
   
   | Test Case              | main      | short-and-optimize | Ratio | Change   
  |
   | ---------------------- | --------- | ------------------ | ----- | 
---------- |
   | all_true               | 61.162 ns | 64.430 ns          | 0.95x | no 
changes |
   | one_false_first        | 448.51 µs | 439.92 µs          | 1.02x | no 
changes |
   | one_false_last         | 447.38 µs | 453.64 µs          | 0.99x | no 
changes |
   | one_false_middle       | 457.79 µs | 447.15 µs          | 1.02x | no 
changes |
   | one_false_middle_left  | 452.78 µs | 447.75 µs          | 1.01x | no 
changes |
   | one_false_middle_right | 451.21 µs | 444.23 µs          | 1.02x | no 
changes |
   | all_false_in_or        | 449.90 µs | 442.36 µs          | 1.02x | no 
changes |
   
   
   ## Possible next step(extend to nulls)
   ### Short-circuit optimization cannot be extended to nulls
   The current short-circuit optimization is only applicable to cases without 
null values. However, based on the calculation principles of "and" and "or", if 
the left-hand side (lhs) evaluates to null, then the final result can only be 
determined by continuing to calculate the right-hand side (rhs). Therefore, 
optimization for this scenario is not feasible. Below is an example of a 
calculation where lhs is null:
   ```sql
   ❯ select null and true;
   +------------------------+
   | NULL AND Boolean(true) |
   +------------------------+
   |                        |
   +------------------------+
   1 row in set. Query took 0.000 seconds.
   
   ❯ select null and false;
   +-------------------------+
   | NULL AND Boolean(false) |
   +-------------------------+
   | false                   |
   +-------------------------+
   1 row in set. Query took 0.000 seconds.
   
   ❯ select null or false;
   +------------------------+
   | NULL OR Boolean(false) |
   +------------------------+
   |                        |
   +------------------------+
   1 row in set. Query took 0.000 seconds.
   
   ❯ select null or true;
   +-----------------------+
   | NULL OR Boolean(true) |
   +-----------------------+
   | true                  |
   +-----------------------+
   1 row in set. Query took 0.000 seconds.
   ```
   
   ### Pre-selection can be extended to include nulls
   
   As I explained earlier: 
https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340, 
pre-selection can actually be extended to cover cases involving null values. 
However, one point needs to be confirmed: 
[filter_record_batch](https://docs.rs/arrow-select/54.2.1/src/arrow_select/filter.rs.html#202-205)
 will retain rows that are null.
   
   > 4. Combine the left-hand and right-hand boolean arrays to produce the 
correct boolean array (modify the positions in the left-hand array marked as 
true based on the values from the right-hand array).
   
   Afterward, we only need to modify the fourth step of the pre-selection 
process mentioned earlier to complete the extension that supports nulls.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to