kylebarron opened a new pull request, #5070: URL: https://github.com/apache/arrow-rs/pull/5070
# Which issue does this PR close? Progress for https://github.com/apache/arrow-rs/issues/5067. I can create a different issue if you need. Closes #. # Rationale for this change There are cases when moving an FFI struct is necessary. In particular, I'm trying to implement #5067 to support the new [Arrow PyCapsule Interface](https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html). In that case, the PyCapsule object owns the ffi pointers, and only exposes a mutable reference to them. The C Data Interface has a section about the [requirements of moving an array](https://arrow.apache.org/docs/format/CDataInterface.html#moving-an-array). In particular: > The consumer can move the ArrowArray structure by bitwise copying or shallow member-wise copying. Then it MUST mark the source structure released (see “released structure” above for how to do it) but without calling the release callback. This ensures that only one live copy of the struct is active at any given time and that lifetime is correctly communicated to the producer I don't think there's a current public API to set `release` to `None` without calling the callback. An alternative implementation of this would be to change `ffi::from_ffi` to take a mutable reference to `FFI_ArrowArray`, which would copy it under the hood. Adding a `copy` method like in this PR is likely to be less invasive? Additionally, it seems I can't implement `Copy` directly on `FFI_ArrowArray` because "`Copy` not allowed on types with destructors". With this change, I no longer get segfaults when copying a pyarrow array ([I did previously](https://github.com/apache/arrow-rs/issues/5067#issuecomment-1807225673)):  using this code: <details> ```rs #[pyfunction] pub fn read_array(ob: &'_ PyAny) -> PyResult<()> { let arr = pyobj_to_array(ob)?; println!("{:?}", arr); Ok(()) } pub fn pyobj_to_array(ob: &'_ PyAny) -> PyResult<ArrayData> { if ob.hasattr("__arrow_c_array__")? { let tuple = ob.getattr("__arrow_c_array__")?.call0()?; if !tuple.is_instance_of::<PyTuple>() { return Err(PyTypeError::new_err( "Expected __arrow_c_array__ to return a tuple.", )); } let schema_capsule = tuple.get_item(0)?; if !schema_capsule.is_instance_of::<PyCapsule>() { return Err(PyTypeError::new_err( "Expected __arrow_c_array__ first element to be PyCapsule.", )); } let schema_capsule: &PyCapsule = PyTryInto::try_into(schema_capsule)?; let schema_capsule_name = schema_capsule.name()?; if schema_capsule_name.is_none() { return Err(PyValueError::new_err( "Expected PyCapsule to have name set.", )); } let schema_capsule_name = schema_capsule_name.unwrap().to_str()?; if schema_capsule_name != "arrow_schema" { return Err(PyValueError::new_err( "Expected name 'arrow_schema' in PyCapsule.", )); } let array_capsule = tuple.get_item(1)?; if !array_capsule.is_instance_of::<PyCapsule>() { return Err(PyTypeError::new_err( "Expected __arrow_c_array__ second element to be PyCapsule.", )); } let array_capsule: &PyCapsule = PyTryInto::try_into(array_capsule)?; let array_capsule_name = array_capsule.name()?; if array_capsule_name.is_none() { return Err(PyValueError::new_err( "Expected PyCapsule to have name set.", )); } let array_capsule_name = array_capsule_name.unwrap().to_str()?; if array_capsule_name != "arrow_array" { return Err(PyValueError::new_err( "Expected name 'arrow_array' in PyCapsule.", )); } let array_ptr = array_capsule.pointer(); let array_ptr = array_ptr as *mut ffi::FFI_ArrowArray; let owned_array_ptr = unsafe { array_ptr.as_mut().unwrap().copy() }; unsafe { println!( "is original released: {}", array_ptr.as_mut().unwrap().is_released() ); }; let arr = unsafe { ffi::from_ffi( owned_array_ptr, schema_capsule.reference::<ffi::FFI_ArrowSchema>(), ) .unwrap() }; return Ok(arr); } Err(PyValueError::new_err( "Expected an object with dunder __arrow_c_array__", )) } ``` <details> # What changes are included in this PR? Add a `copy()` method onto `FFI_ArrowArray` # Are there any user-facing changes? Add a `copy()` method onto `FFI_ArrowArray` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
