alamb commented on code in PR #6089:
URL: https://github.com/apache/arrow-rs/pull/6089#discussion_r1685722429
##########
arrow-arith/src/aggregate.rs:
##########
@@ -410,14 +413,49 @@ where
}
}
+/// Helper to compute min/max of [`GenericByteViewArray<T>`].
+/// The specialized min/max leverages the inlined values to compare the byte
views.
+fn min_max_view_helper<T: ByteViewType>(
Review Comment:
Could you also add some comments for what `swap_cond` means in this context?
##########
arrow-arith/src/aggregate.rs:
##########
@@ -410,14 +413,49 @@ where
}
}
+/// Helper to compute min/max of [`GenericByteViewArray<T>`].
+/// The specialized min/max leverages the inlined values to compare the byte
views.
+fn min_max_view_helper<T: ByteViewType>(
+ array: &GenericByteViewArray<T>,
+ swap_cond: cmp::Ordering,
+) -> Option<&T::Native> {
+ let null_count = array.null_count();
+ if null_count == array.len() {
+ None
+ } else if null_count == 0 {
+ let min_idx = (0..array.len()).reduce(|acc, item| {
+ let cmp = unsafe { compare_byte_view_unchecked(array, acc, array,
item) };
+ if cmp == swap_cond {
+ item
+ } else {
+ acc
+ }
+ });
+ unsafe { min_idx.map(|idx| array.value_unchecked(idx)) }
Review Comment:
```suggestion
// Safety: idx came from valid range `0..array.len()`
unsafe { min_idx.map(|idx| array.value_unchecked(idx)) }
```
##########
arrow-arith/src/aggregate.rs:
##########
@@ -410,14 +413,49 @@ where
}
}
+/// Helper to compute min/max of [`GenericByteViewArray<T>`].
+/// The specialized min/max leverages the inlined values to compare the byte
views.
+fn min_max_view_helper<T: ByteViewType>(
+ array: &GenericByteViewArray<T>,
+ swap_cond: cmp::Ordering,
+) -> Option<&T::Native> {
+ let null_count = array.null_count();
+ if null_count == array.len() {
+ None
+ } else if null_count == 0 {
+ let min_idx = (0..array.len()).reduce(|acc, item| {
+ let cmp = unsafe { compare_byte_view_unchecked(array, acc, array,
item) };
Review Comment:
I wonder if using `idx` rather than `item` would make it clearer that `item`
is an array index and would also be more consistent with `min_idx` naming
Also, I think we should justify the `unsafe` in a comment as good practice.
Something like
```suggestion
// SAFETY: array's length is correct so item is within bounds
let cmp = unsafe { compare_byte_view_unchecked(array, acc,
array, item) };
```
##########
arrow-arith/Cargo.toml:
##########
@@ -38,6 +38,7 @@ arrow-array = { workspace = true }
arrow-buffer = { workspace = true }
arrow-data = { workspace = true }
arrow-schema = { workspace = true }
+arrow-ord = { workspace = true }
Review Comment:
I agree we should not add any new dependencies -- I think @tustvold 's idea
was to keep the crates as separate as possible so people could keep the
arrow-rs code they used in their project as small as possible
Thus I think we should move `compare_byte_view_unchecked` instead of this
##########
arrow-arith/src/aggregate.rs:
##########
@@ -410,14 +413,49 @@ where
}
}
+/// Helper to compute min/max of [`GenericByteViewArray<T>`].
+/// The specialized min/max leverages the inlined values to compare the byte
views.
+fn min_max_view_helper<T: ByteViewType>(
+ array: &GenericByteViewArray<T>,
+ swap_cond: cmp::Ordering,
+) -> Option<&T::Native> {
+ let null_count = array.null_count();
+ if null_count == array.len() {
+ None
+ } else if null_count == 0 {
+ let min_idx = (0..array.len()).reduce(|acc, item| {
Review Comment:
Given this is for both min and max, maybe calling this `target_idx` or
`found_idx` would be clearer
##########
arrow-arith/src/aggregate.rs:
##########
@@ -410,14 +413,49 @@ where
}
}
+/// Helper to compute min/max of [`GenericByteViewArray<T>`].
+/// The specialized min/max leverages the inlined values to compare the byte
views.
+fn min_max_view_helper<T: ByteViewType>(
+ array: &GenericByteViewArray<T>,
+ swap_cond: cmp::Ordering,
+) -> Option<&T::Native> {
+ let null_count = array.null_count();
+ if null_count == array.len() {
+ None
+ } else if null_count == 0 {
+ let min_idx = (0..array.len()).reduce(|acc, item| {
+ let cmp = unsafe { compare_byte_view_unchecked(array, acc, array,
item) };
+ if cmp == swap_cond {
+ item
+ } else {
+ acc
+ }
+ });
+ unsafe { min_idx.map(|idx| array.value_unchecked(idx)) }
+ } else {
+ let nulls = array.nulls().unwrap();
Review Comment:
Same naming comments as above
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]