paleolimbot opened a new issue, #50246: URL: https://github.com/apache/arrow/issues/50246
### Describe the enhancement requested In SedonaDB we're using binaryview to pass around large buffers, and in the process discovered a number of places where these buffers are copied. This is a bit of a non-standard use of the BinaryView so it's no problem, but the copy during scalar extraction was surprising (it seems like this would be a useful feature to directly access the view's buffer, possibly for regular binary/string arrays as well). Our workaround is at https://github.com/apache/sedona-db/pull/999 but here's a more minimal reproducer: ```python import pyarrow as pa import numpy as np big_bytes_array = b"124938ls" * 100 buf = np.arange(1000, dtype=np.uint8) # Creating via a memoryview doesn't keep the original memory pa_array = pa.array([memoryview(buf)], pa.binary_view()) buf_from_pa_array_via_memoryview = np.frombuffer(pa_array.buffers()[2]) np.shares_memory(buf_from_pa_array_via_memoryview, buf) #> False # You can force this by creating a binary array manually and casting to a view pa_array = pa.Array.from_buffers( type=pa.binary(), length=1, buffers=[ None, pa.py_buffer(np.array([0, 1000], dtype=np.int32())), pa.py_buffer(buf), ], ).cast(pa.binary_view()) buf_from_pa_array_via_memoryview = np.frombuffer(pa_array.buffers()[2]) np.shares_memory(buf_from_pa_array_via_memoryview, buf) # However, the act of extracting a scalar forces a copy buf_from_pa_array_scalar = np.frombuffer(pa_array[0].as_buffer()) np.shares_memory(buf_from_pa_array_scalar, buf) #> False ``` ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
