alamb commented on code in PR #16877:
URL: https://github.com/apache/datafusion/pull/16877#discussion_r2228560541


##########
datafusion/functions/src/unicode/character_length.rs:
##########
@@ -136,56 +136,31 @@ where
     // string is ASCII only is relatively cheap.
     // If strings are ASCII only, count bytes instead.
     let is_array_ascii_only = array.is_ascii();
-    let array = if array.null_count() == 0 {
+    let nulls = array.nulls().cloned();
+    let array = {
         if is_array_ascii_only {
             let values: Vec<_> = (0..array.len())
                 .map(|i| {
-                    let value = array.value(i);
+                    // Safety: we are iterating with array.len() so the index 
is always valid
+                    let value = unsafe { array.value_unchecked(i) };
                     T::Native::usize_as(value.len())
                 })
                 .collect();
-            PrimitiveArray::<T>::new(values.into(), None)
+            PrimitiveArray::<T>::new(values.into(), nulls)
         } else {
             let values: Vec<_> = (0..array.len())
                 .map(|i| {
-                    let value = array.value(i);
+                    // Safety: we are iterating with array.len() so the index 
is always valid

Review Comment:
   Another idea here is if you know the values are always ascii, you can avoid 
making a `&str` at all -- and instead simply compute the character lengths 
based on the offsets array (for StringArray and LargeStringArray) or the views 
for `StringViewArray)



##########
datafusion/functions/src/unicode/character_length.rs:
##########
@@ -136,56 +136,31 @@ where
     // string is ASCII only is relatively cheap.
     // If strings are ASCII only, count bytes instead.
     let is_array_ascii_only = array.is_ascii();
-    let array = if array.null_count() == 0 {
+    let nulls = array.nulls().cloned();

Review Comment:
   Is the idea to remove the no-nulls optimization because it doesn't make 
things faster?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to