paleolimbot opened a new issue, #50246:
URL: https://github.com/apache/arrow/issues/50246

   ### Describe the enhancement requested
   
   In SedonaDB we're using binaryview to pass around large buffers, and in the 
process discovered a number of places where these buffers are copied. This is a 
bit of a non-standard use of the BinaryView so it's no problem, but the copy 
during scalar extraction was surprising (it seems like this would be a useful 
feature to directly access the view's buffer, possibly for regular 
binary/string arrays as well). Our workaround is at 
https://github.com/apache/sedona-db/pull/999 but here's a more minimal 
reproducer:
   
   ```python
   import pyarrow as pa
   import numpy as np
   
   big_bytes_array = b"124938ls" * 100
   buf = np.arange(1000, dtype=np.uint8)
   
   # Creating via a memoryview doesn't keep the original memory
   pa_array = pa.array([memoryview(buf)], pa.binary_view())
   buf_from_pa_array_via_memoryview = np.frombuffer(pa_array.buffers()[2])
   np.shares_memory(buf_from_pa_array_via_memoryview, buf)
   #> False
   
   # You can force this by creating a binary array manually and casting to a 
view
   pa_array = pa.Array.from_buffers(
       type=pa.binary(),
       length=1,
       buffers=[
           None,
           pa.py_buffer(np.array([0, 1000], dtype=np.int32())),
           pa.py_buffer(buf),
       ],
   ).cast(pa.binary_view())
   
   buf_from_pa_array_via_memoryview = np.frombuffer(pa_array.buffers()[2])
   np.shares_memory(buf_from_pa_array_via_memoryview, buf)
   
   # However, the act of extracting a scalar forces a copy
   buf_from_pa_array_scalar = np.frombuffer(pa_array[0].as_buffer())
   np.shares_memory(buf_from_pa_array_scalar, buf)
   #> False
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to