[GitHub] [arrow-rs] wjones127 opened a new issue, #4805: pyarrow module can't roundtrip tensor arrays

via GitHub Sat, 09 Sep 2023 12:28:24 -0700


wjones127 opened a new issue, #4805:
URL: https://github.com/apache/arrow-rs/issues/4805


   **Describe the bug**
   
   When exporting a tensor array (a kind of extension array) as a record batch, 
PyArrow segfaults. This does not happen if the batch is exported as a stream.
   
   **To Reproduce**
   
   The following test will fail in 
`arrow-pyarrow-integration-testing/tests/test_sql.py`:
   
   ```python
   def test_tensor_array():
       tensor_type = pa.fixed_shape_tensor(pa.float32(), [2, 3])
       inner = pa.array([float(x) for x in range(1, 7)] + [None] * 12, 
pa.float32())
       storage = pa.FixedSizeListArray.from_arrays(inner, 6)
       f32_array = pa.ExtensionArray.from_storage(tensor_type, storage)
   
       # Round-tripping as an array gives back storage type, because arrow-rs 
has
       # no notion of extension types.
       b = rust.round_trip_array(f32_array)
       assert b == f32_array.storage
   
       batch = pa.record_batch([f32_array], ["tensor"])
       b = rust.round_trip_record_batch(batch)
       assert b == batch
   
       del b
   ```
   
   **Expected behavior**
   
   We should round trip the array type successfully.
   
   **Additional context**
   
   The record batch exporting is done by exporting each individual array, but 
this separates the extension arrays from their metadata. I suspect PyArrow 
segfaults because it is receiving a plain array and then later told it is an 
extension in the final schema. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-rs] wjones127 opened a new issue, #4805: pyarrow module can't roundtrip tensor arrays

Reply via email to