Jefffrey commented on code in PR #20374:
URL: https://github.com/apache/datafusion/pull/20374#discussion_r2825501405
##########
datafusion/functions-nested/src/array_has.rs:
##########
@@ -362,24 +373,34 @@ fn array_has_dispatch_for_scalar(
ArrayWrapper::LargeList(arr) => arr.nulls(),
};
- for (i, (start, end)) in haystack.offsets().tuple_windows().enumerate() {
- let length = end - start;
+ let offsets: Vec<usize> = haystack.offsets().collect();
+ let mut matches = eq_bits.set_indices().peekable();
+ let mut final_contained = vec![Some(false); haystack.len()];
+
+ for (i, window) in offsets.windows(2).enumerate() {
Review Comment:
```suggestion
for (i, (_start, end)) in haystack.offsets().tuple_windows().enumerate()
{
```
Could still reuse `tuple_windows` (and avoid a collect)
##########
datafusion/functions-nested/src/array_has.rs:
##########
@@ -362,24 +373,34 @@ fn array_has_dispatch_for_scalar(
ArrayWrapper::LargeList(arr) => arr.nulls(),
};
- for (i, (start, end)) in haystack.offsets().tuple_windows().enumerate() {
- let length = end - start;
+ let offsets: Vec<usize> = haystack.offsets().collect();
+ let mut matches = eq_bits.set_indices().peekable();
+ let mut final_contained = vec![Some(false); haystack.len()];
Review Comment:
Another idea is replacing `final_contained` with
[`BooleanBufferBuilder`](https://docs.rs/arrow/latest/arrow/array/struct.BooleanBufferBuilder.html);
for nulls we can just append `false` and just pass through `validity` to
create the final boolean array, since that would be an identical null buffer
##########
datafusion/functions-nested/src/array_has.rs:
##########
@@ -362,24 +373,34 @@ fn array_has_dispatch_for_scalar(
ArrayWrapper::LargeList(arr) => arr.nulls(),
};
- for (i, (start, end)) in haystack.offsets().tuple_windows().enumerate() {
- let length = end - start;
+ let offsets: Vec<usize> = haystack.offsets().collect();
+ let mut matches = eq_bits.set_indices().peekable();
Review Comment:
I wonder if we could get further gains by using `set_slices()` instead?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]