acking-you commented on PR #15694:
URL: https://github.com/apache/datafusion/pull/15694#issuecomment-2800087698
The relevant bug fixes have been completed, and corresponding performance
tests have been conducted. The results show that pre-selection has achieved
significant gains! @Dandandan @alamb
Compare the current optimization with the main branch using `cargo bench
--bench binary_op`. The results are as follows, where fluctuations within ±5%
are considered as no changes.
## Performance Comparison of the AND Logic Group
| Test Case | main | short-and-optimize | Ratio |
Change |
| --------------------- | --------- | ------------------ | ----------- |
---------- |
| all_false | 62.623 ns | 65.923 ns | 0.95x | no
changes |
| one_true_first | 448.69 µs | 195.60 µs | **2.29x** | ↑ ✅
|
| one_true_last | 452.00 µs | 171.91 µs | **2.63x** | ↑ ✅
|
| one_true_middle | 453.12 µs | 173.94 µs | **2.60x** | ↑ ✅
|
| one_true_middle_left | 453.06 µs | 165.70 µs | **2.73x** | ↑ ✅
|
| one_true_middle_right | 459.61 µs | 171.53 µs | **2.68x** | ↑ ✅
|
| all_true_in_and | 450.03 µs | 445.76 µs | 1.01x | no
changes |
## Performance Comparison of the OR Logic Group
| Test Case | main | short-and-optimize | Ratio | Change
|
| ---------------------- | --------- | ------------------ | ----- |
---------- |
| all_true | 61.162 ns | 64.430 ns | 0.95x | no
changes |
| one_false_first | 448.51 µs | 439.92 µs | 1.02x | no
changes |
| one_false_last | 447.38 µs | 453.64 µs | 0.99x | no
changes |
| one_false_middle | 457.79 µs | 447.15 µs | 1.02x | no
changes |
| one_false_middle_left | 452.78 µs | 447.75 µs | 1.01x | no
changes |
| one_false_middle_right | 451.21 µs | 444.23 µs | 1.02x | no
changes |
| all_false_in_or | 449.90 µs | 442.36 µs | 1.02x | no
changes |
## Possible next step(extend to nulls)
### Short-circuit optimization cannot be extended to nulls
The current short-circuit optimization is only applicable to cases without
null values. However, based on the calculation principles of "and" and "or", if
the left-hand side (lhs) evaluates to null, then the final result can only be
determined by continuing to calculate the right-hand side (rhs). Therefore,
optimization for this scenario is not feasible. Below is an example of a
calculation where lhs is null:
```sql
❯ select null and true;
+------------------------+
| NULL AND Boolean(true) |
+------------------------+
| |
+------------------------+
1 row in set. Query took 0.000 seconds.
❯ select null and false;
+-------------------------+
| NULL AND Boolean(false) |
+-------------------------+
| false |
+-------------------------+
1 row in set. Query took 0.000 seconds.
❯ select null or false;
+------------------------+
| NULL OR Boolean(false) |
+------------------------+
| |
+------------------------+
1 row in set. Query took 0.000 seconds.
❯ select null or true;
+-----------------------+
| NULL OR Boolean(true) |
+-----------------------+
| true |
+-----------------------+
1 row in set. Query took 0.000 seconds.
```
### Pre-selection can be extended to include nulls
As I explained earlier:
https://github.com/apache/datafusion/pull/15694#issuecomment-2799010340,
pre-selection can actually be extended to cover cases involving null values.
However, one point needs to be confirmed:
[filter_record_batch](https://docs.rs/arrow-select/54.2.1/src/arrow_select/filter.rs.html#202-205)
will retain rows that are null.
> 4. Combine the left-hand and right-hand boolean arrays to produce the
correct boolean array (modify the positions in the left-hand array marked as
true based on the values from the right-hand array).
Afterward, we only need to modify the fourth step of the pre-selection
process mentioned earlier to complete the extension that supports nulls.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]