viirya commented on code in PR #50327:
URL: https://github.com/apache/arrow/pull/50327#discussion_r3509800101


##########
python/pyarrow/array.pxi:
##########
@@ -3974,6 +4155,45 @@ cdef class StringArray(Array):
     Concrete class for Arrow arrays of string (or utf8) data type.
     """
 
+    def to_pylist(self, *, maps_as_pydicts=None):
+        """
+        Convert to a list of native Python objects.
+
+        Parameters
+        ----------
+        maps_as_pydicts : str, optional, default `None`
+            Valid values are `None`, 'lossy', or 'strict'.
+            This parameter is ignored for non-nested Arrays.
+
+        Returns
+        -------
+        lst : list
+        """
+        cdef:
+            CStringArray* arr = <CStringArray*> self.ap
+            int64_t i, n
+            int32_t length
+            const uint8_t* data
+        self._assert_cpu()
+        n = arr.length()
+        result = []
+        # Decode values straight from the data buffer instead of creating
+        # a C++ Scalar and a Python Scalar wrapper per value (see GH-28694).
+        if arr.null_count() == 0:
+            for i in range(n):
+                data = arr.GetValue(i, &length)
+                result.append(
+                    cp.PyUnicode_DecodeUTF8(<const char*> data, length, NULL))
+        else:
+            for i in range(n):
+                if arr.IsNull(i):
+                    result.append(None)
+                else:
+                    data = arr.GetValue(i, &length)
+                    result.append(
+                        cp.PyUnicode_DecodeUTF8(<const char*> data, length, 
NULL))
+        return result

Review Comment:
   `null_count()` is a one-time vectorized popcount over the validity bitmap 
(~n/8 bytes, well under a millisecond for 2M rows), computed and cached per 
`ArrayData`. In exchange, the no-null branch skips the per-element `IsNull()` 
check entirely. Branching on `null_bitmap_data() == NULL` instead would save 
that single scan but degrade the common case of a sliced/combined array that 
*has* a bitmap yet contains no nulls in range — that would take the per-element 
`IsNull()` path forever. So the current form should be the better trade-off in 
practice.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to