Re: [PR] Specialize Prefix Match for `Like/ILike` between Array and Scalar for StringViewArray [arrow-rs]

via GitHub Mon, 19 Aug 2024 06:56:12 -0700


alamb commented on code in PR #6231:
URL: https://github.com/apache/arrow-rs/pull/6231#discussion_r1721832656



##########
arrow-array/src/array/byte_view_array.rs:
##########
@@ -667,6 +699,58 @@ impl StringViewArray {
             None => true,
         })
     }
+
+    /// Returns an iterator over the prefix bytes of this array with respect 
to the prefix length.
+    /// If the prefix length is larger than the string length, it will return 
the empty string.
+    pub fn prefix_iter(&self, prefix_len: usize) -> impl Iterator<Item = &str> 
{
+        self.views().into_iter().map(move |v| {
+            let len = (*v as u32) as usize;
+
+            if len < prefix_len {
+                return "";
+            }
+
+            let b = if prefix_len <= 4 || len <= 12 {
+                unsafe { StringViewArray::inline_value(v, prefix_len) }
+            } else {
+                let view = ByteView::from(*v);
+                let data = unsafe {
+                    self.data_buffers()
+                        .get_unchecked(view.buffer_index as usize)
+                };
+                let offset = view.offset as usize;
+                unsafe { data.get_unchecked(offset..offset + prefix_len) }
+            };
+
+            unsafe { str::from_utf8_unchecked(b) }
+        })
+    }
+
+    /// Returns an iterator over the suffix bytes of this array with respect 
to the suffix length.
+    /// If the suffix length is larger than the string length, it will return 
the empty string.
+    pub fn suffix_iter(&self, suffix_len: usize) -> impl Iterator<Item = &str> 
{
+        self.views().into_iter().map(move |v| {
+            let len = (*v as u32) as usize;
+
+            if len < suffix_len {
+                return "";
+            }
+
+            let b = if len <= 12 {
+                unsafe { &StringViewArray::inline_value(v, len)[len - 
suffix_len..] }
+            } else {
+                let view = ByteView::from(*v);
+                let data = unsafe {
+                    self.data_buffers()
+                        .get_unchecked(view.buffer_index as usize)
+                };
+                let offset = view.offset as usize;
+                unsafe { data.get_unchecked(offset + len - suffix_len..offset 
+ len) }

Review Comment:
   How do we know this byte sequence is valid utf8?
   
   For example, what if the last character of the array is a multi-byte UTF8 
character and we call `suffix_iter()` -- that would return a 1 byte "str" that 
wasn't valid utf8



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] Specialize Prefix Match for `Like/ILike` between Array and Scalar for StringViewArray [arrow-rs]

Reply via email to