kylebarron opened a new issue, #6151:
URL: https://github.com/apache/arrow-rs/issues/6151

   **Describe the bug**
   
   In https://github.com/kylebarron/arro3 I'm exporting arrow-rs functionality 
for general Python use. I seem to have hit a bug importing sliced arrays.
   
   In 
[`import_array_pycapsules`](https://github.com/kylebarron/arro3/blob/9673b62161f94d41a06c71f895abb8662b8e2864/pyo3-arrow/src/ffi/from_python/utils.rs#L72-L89)
 (which is vendored from arrow-rs code 
[here](https://github.com/apache/arrow-rs/blob/80ed7128510bac114c6feec08c34ef3beed3a44a/arrow/src/pyarrow.rs#L265-L275))
 I have:
   
   ```rs
   pub(crate) fn import_array_pycapsules(
       schema_capsule: &Bound<PyCapsule>,
       array_capsule: &Bound<PyCapsule>,
   ) -> PyResult<(ArrayRef, Field)> {
       validate_pycapsule_name(schema_capsule, "arrow_schema")?;
       validate_pycapsule_name(array_capsule, "arrow_array")?;
   
       let schema_ptr = unsafe { schema_capsule.reference::<FFI_ArrowSchema>() 
};
       let array = unsafe { FFI_ArrowArray::from_raw(array_capsule.pointer() as 
_) };
   
       let array_data = unsafe { arrow::ffi::from_ffi(array, schema_ptr) }
           .map_err(|err| PyTypeError::new_err(err.to_string()))?;
       dbg!(array_data.offset());
   
       let field = Field::try_from(schema_ptr).map_err(|err| 
PyTypeError::new_err(err.to_string()))?;
       let array = make_array(array_data);
   
       dbg!(array.offset());
       Ok((array, field))
   }
   ```
   
   Note the two `dbg!` macros. When invoked from Python with a pyarrow 
`StructArray`, the array offset is lost.
   
   ```
   import pyarrow as pa
   import pytest
   from arro3.compute import struct_field
   
   a = pa.array([1, 2, 3])
   b = pa.array([3, 4, 5])
   struct_arr = pa.StructArray.from_arrays([a, b], names=["a", "b"])
   sliced = struct_arr.slice(1, 2)
   sliced.offset # 1
   pa.array(struct_field(sliced, [0]))
   # <pyarrow.lib.Int64Array object at 0x10fa94700>
   # [
   #   1,
   #   2
   # ]
   ```
   
   Note that the _first two_ elements of `a` are kept, with the `offset` not 
used. I've isolated this to the two lines with `dbg!`. Those print:
   ```
   [pyo3-arrow/src/ffi/from_python/utils.rs:84:5] array_data.offset() = 1
   [pyo3-arrow/src/ffi/from_python/utils.rs:87:5] array.offset() = 0
   ```
   In particular `make_array` does not check the `offset` from the base array.
   
   **To Reproduce**
   
   Here's the way to reproduce the upstream bug
   ```
   git clone https://github.com/kylebarron/arro3
   cd arro3
   git checkout 9673b62
   poetry install
   poetry run maturin develop -m arro3-core/Cargo.toml
   poetry run maturin develop -m arro3-compute/Cargo.toml
   poetry run pytest
   ```
   
   I can _try_ to reproduce this in pure rust if needed, but that may not be 
possible because the `StructArray` seems to always export an `offset` of `0`, 
and so it may not be easy to reproduce this importing behavior.
   
   **Expected behavior**
   
   Expected the array offset to be maintained.
   
   **Additional context**
   <!--
   Add any other context about the problem here.
   -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to