paleolimbot opened a new pull request, #451:
URL: https://github.com/apache/arrow-nanoarrow/pull/451
This PR updates the logic that creates a "shallow copy" of an `ArrowArray`.
Before, it simply made a shallow copy of the outer array, which works in most
cases. However, the spec allows for "moving" child arrays, which means that the
guarantee of a valid *outer* array does not guarantee anything about the
validity of the `ArrowArray*` child pointers. It's difficult, but possible, to
trigger this in Python (below); however, I think this sort of code is common
for importers (because it allows the lifecycle of columns in an array to be
independent).
```python
import nanoarrow as na
import pyarrow as pa
# Given some array
array = na.c_array_from_buffers(
na.struct({"col1": na.int32()}),
3,
[None],
children=[na.c_array([1, 2, 3], na.int32())]
)
user_array = na.Array(array)
user_array
#> nanoarrow.Array<int32>[3]
#> {'col1': 1}
#> {'col1': 2}
#> {'col1': 3}
# totally valid shallow copy of this array
schema_capsule, array_capsule = array.__arrow_c_array__()
array2 = na.c_array(array_capsule, schema_capsule)
# A consumer is technically allowed to move a child array
x = pa.Array._import_from_c(array2.child(0)._addr(), pa.int32())
del array
del x
# With the previous shallow copy implementation, this could segfault or fail
list(user_array.iter_py())
# Instead of a segfault I tend to get:
#> NanoarrowException: ArrowBasicArrayStreamValidate() failed (22): Expected
int32 array buffer 1 to have size >= 12 bytes but found buffer with 0 bytes
```
After this PR the original array remains valid:
```python
list(user_array.iter_py())
#> [{'col1': 1}, {'col1': 2}, {'col1': 3}]
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]