alamb commented on code in PR #6231:
URL: https://github.com/apache/arrow-rs/pull/6231#discussion_r1722248203
##########
arrow-array/src/array/byte_view_array.rs:
##########
@@ -278,6 +279,22 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
T::Native::from_bytes_unchecked(b)
}
+ /// Returns the bytes at index `i`
+ /// # Safety
+ /// Caller is responsible for ensuring that the index is within the bounds
of the array
+ pub unsafe fn bytes_unchecked(&self, idx: usize) -> &[u8] {
Review Comment:
I also didn't see `bytes_unchecked` used anywhere in the PR now 🤔
##########
arrow-string/src/predicate.rs:
##########
@@ -114,21 +114,80 @@ impl<'a> Predicate<'a> {
Predicate::IEqAscii(v) => BooleanArray::from_unary(array,
|haystack| {
haystack.eq_ignore_ascii_case(v) != negate
}),
- Predicate::Contains(finder) => BooleanArray::from_unary(array,
|haystack| {
- finder.find(haystack.as_bytes()).is_some() != negate
- }),
- Predicate::StartsWith(v) => BooleanArray::from_unary(array,
|haystack| {
- starts_with(haystack, v, equals_kernel) != negate
- }),
- Predicate::IStartsWithAscii(v) => BooleanArray::from_unary(array,
|haystack| {
- starts_with(haystack, v, equals_ignore_ascii_case_kernel) !=
negate
- }),
- Predicate::EndsWith(v) => BooleanArray::from_unary(array,
|haystack| {
- ends_with(haystack, v, equals_kernel) != negate
- }),
- Predicate::IEndsWithAscii(v) => BooleanArray::from_unary(array,
|haystack| {
- ends_with(haystack, v, equals_ignore_ascii_case_kernel) !=
negate
- }),
+ Predicate::Contains(finder) => {
+ if let Some(string_view_array) =
array.as_any().downcast_ref::<StringViewArray>() {
+ BooleanArray::from(
+ string_view_array
+ .bytes_iter()
+ .map(|haystack| finder.find(haystack).is_some() !=
negate)
+ .collect::<Vec<_>>(),
+ )
+ } else {
+ BooleanArray::from_unary(array, |haystack| {
+ finder.find(haystack.as_bytes()).is_some() != negate
+ })
+ }
+ }
+ Predicate::StartsWith(v) => {
+ if let Some(string_view_array) =
array.as_any().downcast_ref::<StringViewArray>() {
+ BooleanArray::from(
+ string_view_array
+ .prefix_iter(v.len())
+ .map(|haystack| starts_with(haystack, v,
equals_kernel) != negate)
+ .collect::<Vec<_>>(),
+ )
+ } else {
+ BooleanArray::from_unary(array, |haystack| {
+ starts_with(haystack, v, equals_kernel) != negate
+ })
+ }
+ }
+ Predicate::IStartsWithAscii(v) => {
+ if let Some(string_view_array) =
array.as_any().downcast_ref::<StringViewArray>() {
+ BooleanArray::from(
+ string_view_array
+ .prefix_iter(v.len())
+ .map(|haystack| {
+ starts_with(haystack, v,
equals_ignore_ascii_case_kernel) != negate
+ })
+ .collect::<Vec<_>>(),
+ )
+ } else {
+ BooleanArray::from_unary(array, |haystack| {
+ starts_with(haystack, v,
equals_ignore_ascii_case_kernel) != negate
+ })
+ }
+ }
+ Predicate::EndsWith(v) => {
+ if let Some(string_view_array) =
array.as_any().downcast_ref::<StringViewArray>() {
+ BooleanArray::from(
+ string_view_array
+ .suffix_iter(v.len())
Review Comment:
Given that the `starts_with` function just takes `bytes` I think you could
change the `prefix_iter` and `suffix_iter` to produce `&[u8]` and pass them
directly to the finder
##########
arrow-array/src/array/byte_view_array.rs:
##########
@@ -278,6 +279,22 @@ impl<T: ByteViewType + ?Sized> GenericByteViewArray<T> {
T::Native::from_bytes_unchecked(b)
}
+ /// Returns the bytes at index `i`
+ /// # Safety
+ /// Caller is responsible for ensuring that the index is within the bounds
of the array
+ pub unsafe fn bytes_unchecked(&self, idx: usize) -> &[u8] {
Review Comment:
Minor I think we can remove some duplication by calling `bytes_unchecked`
from `value_unchecked`:
```rust
/// Returns the element at index `i`
/// # Safety
/// Caller is responsible for ensuring that the index is within the
bounds of the array
pub unsafe fn value_unchecked(&self, idx: usize) -> &T::Native {
T::Native::from_bytes_unchecked(self.bytes_unchecked(idx))
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]