[PR] fix(python): Make shallow CArray copies less shallow to accomodate moving children [arrow-nanoarrow]

via GitHub Wed, 01 May 2024 17:12:02 -0700


paleolimbot opened a new pull request, #451:
URL: https://github.com/apache/arrow-nanoarrow/pull/451


   This PR updates the logic that creates a "shallow copy" of an `ArrowArray`. 
Before, it simply made a shallow copy of the outer array, which works in most 
cases. However, the spec allows for "moving" child arrays, which means that the 
guarantee of a valid *outer* array does not guarantee anything about the 
validity of the `ArrowArray*` child pointers. It's difficult, but possible, to 
trigger this in Python (below); however, I think this sort of code is common 
for importers (because it allows the lifecycle of columns in an array to be 
independent).
   
   ```python
   import nanoarrow as na
   import pyarrow as pa
   
   # Given some array
   array = na.c_array_from_buffers(
       na.struct({"col1": na.int32()}),
       3,
       [None],
       children=[na.c_array([1, 2, 3], na.int32())]
   )
   
   user_array = na.Array(array)
   user_array
   #> nanoarrow.Array<int32>[3]
   #> {'col1': 1}
   #> {'col1': 2}
   #> {'col1': 3}
   
   # totally valid shallow copy of this array
   schema_capsule, array_capsule = array.__arrow_c_array__()
   array2 = na.c_array(array_capsule, schema_capsule)
   
   # A consumer is technically allowed to move a child array
   x = pa.Array._import_from_c(array2.child(0)._addr(), pa.int32())
   del array
   del x
   
   # With the previous shallow copy implementation, this could segfault or fail
   list(user_array.iter_py())
   # Instead of a segfault I tend to get:
   #> NanoarrowException: ArrowBasicArrayStreamValidate() failed (22): Expected 
int32 array buffer 1 to have size >= 12 bytes but found buffer with 0 bytes
   ```
   
   After this PR the original array remains valid:
   
   ```python
   list(user_array.iter_py())
   #> [{'col1': 1}, {'col1': 2}, {'col1': 3}]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[PR] fix(python): Make shallow CArray copies less shallow to accomodate moving children [arrow-nanoarrow]

Reply via email to