jhorstmann commented on a change in pull request #8260:
URL: https://github.com/apache/arrow/pull/8260#discussion_r494974111



##########
File path: rust/arrow/src/compute/kernels/length.rs
##########
@@ -17,52 +17,56 @@
 
 //! Defines kernel for length of a string array
 
-use crate::array::*;
+use crate::datatypes::ToByteSlice;
+use crate::{array::*, buffer::Buffer};
 use crate::{
     datatypes::DataType,
-    datatypes::UInt32Type,
     error::{ArrowError, Result},
 };
 use std::sync::Arc;
 
-/// Returns an array of UInt32 denoting the number of characters in each 
string in the array.
+fn length_string<OffsetSize>(array: &Array, data_type: DataType) -> 
Result<ArrayRef>
+where
+    OffsetSize: OffsetSizeTrait,
+{
+    // note: offsets are stored as u8, but they can be interpreted as 
OffsetSize
+    let offsets = array.data_ref().clone().buffers()[0].clone();
+    // this is a 30% improvement over iterating over u8s and building 
OffsetSize, which
+    // justifies the usage of `unsafe`.
+    let slice: &[OffsetSize] = unsafe { offsets.typed_data::<OffsetSize>() };

Review comment:
       To support sliced arrays this needs to take the offset of the array into 
account. The following should work, but a testcase would be nice:
   
   ```suggestion
       let slice: &[OffsetSize] = unsafe { offsets.typed_data::<OffsetSize>() 
}[array.offset()..];
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to