alamb commented on code in PR #6671:
URL: https://github.com/apache/arrow-rs/pull/6671#discussion_r1826542917


##########
arrow-string/src/length.rs:
##########
@@ -137,6 +137,26 @@ pub fn bit_length(array: &dyn Array) -> Result<ArrayRef, 
ArrowError> {
             let list = array.as_string::<i64>();
             Ok(bit_length_impl::<Int64Type>(list.offsets(), list.nulls()))
         }
+        DataType::Utf8View => {
+            let string_view_array = array
+                .as_any()
+                .downcast_ref::<StringViewArray>()
+                .ok_or_else(|| ArrowError::ComputeError("Expected Utf8View 
array".to_string()))?;
+            let mut bit_lengths = Vec::with_capacity(array.len());
+            for i in 0..array.len() {
+                let bit_length = if string_view_array.is_valid(i) {
+                    (string_view_array.value(i).len() * 8) as i32

Review Comment:
   This code could be made significantly faster by just checking the lengths in 
the views rather than creating the string length.
   
   you can get the views like 
https://docs.rs/arrow/latest/arrow/array/type.StringViewArray.html#method.views
   
   The layout is described here: 
https://docs.rs/arrow/latest/arrow/array/struct.GenericByteViewArray.html#layout-views-and-buffers
   
   Something like
   
   ```rust
   let lengths = string_view_array.views()
     .iter()
     .map(|view| view as u32)
   ```
   
   



##########
arrow-string/src/length.rs:
##########
@@ -137,6 +137,26 @@ pub fn bit_length(array: &dyn Array) -> Result<ArrayRef, 
ArrowError> {
             let list = array.as_string::<i64>();
             Ok(bit_length_impl::<Int64Type>(list.offsets(), list.nulls()))
         }
+        DataType::Utf8View => {
+            let string_view_array = array
+                .as_any()
+                .downcast_ref::<StringViewArray>()
+                .ok_or_else(|| ArrowError::ComputeError("Expected Utf8View 
array".to_string()))?;

Review Comment:
   I think it would be more consistent with the rest of the codebase to use 
`array.as_string_view()` here rather than downcast_ref.
   
   I think this is correct, but would recommend changing the code to be 
consistent



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to