Re: [I] ChunkedArray.to_numpy() sets the OWNDATA and WRITEABLE flags to False [arrow]

via GitHub Wed, 07 Feb 2024 02:42:32 -0800


AlenkaF commented on issue #38789:
URL: https://github.com/apache/arrow/issues/38789#issuecomment-1931766655


   Thank you for opening an issue and sorry for such a late reply!
   
   I agree, the documentation for `ChunkedArray.to_numpy` is a bit misleading. 
But the behaviour should actually be correct!
   
   The example you are using in the issue description is a special case as the 
chunked array consists only of one chunk and can therefore be converted to 
numpy zero copy:
   
   ```python
   In [1]: import pyarrow as pa
   
   In [2]: arr = pa.chunked_array([[1.1]])
   
   In [3]: arr.to_numpy(zero_copy_only=False).base.base
   Out[3]: 
   <pyarrow.lib.ChunkedArray object at 0x105329170>
   [
     [
       1.1
     ]
   ]
   
   In [4]: arr
   Out[4]: 
   <pyarrow.lib.ChunkedArray object at 0x105329170>
   [
     [
       1.1
     ]
   ]
   ```
   
   which means the code in PyArrow that checks for `zero_copy_only=False` 
should be changed to allow zero copy in case of a single chunk together with 
the docs for `zero_copy_only` attribute:
   
   
https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/table.pxi#L498-L501
   
   
https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/table.pxi#L481-L485
   
   But the flags  `OWNDATA : False` and ` WRITEABLE : False` are correct.
   
   Now if the chunked array has more than one chunk, the `WRITEABLE` flag is 
different but `OWNDATA` is still false as Arrow is holding on to the memory of 
the Buffer created with the copy:
   
   ```python
   In [5]: arr = pa.chunked_array([[1.1], [2.2]])
   
   In [6]: arr.to_numpy(zero_copy_only=False).base.base
   Out[6]: <capsule object "arrow::Buffer" at 0x10a017b10>
   ```
   
   To sum up, what I think we should do is:
   
   - Change the docstrings for 
https://arrow.apache.org/docs/python/generated/pyarrow.ChunkedArray.html#pyarrow.ChunkedArray.to_numpy
 to something like "Return a NumPy copy of this array except in case of a 
single chunk when it follows the same behaviour as `pyarrow.Array.to_numpy`." . 
The "experimental" should also be removed =)
   - Change the docstring for `zero_copy_only` parameter and change the check 
for this parameter in `ChunkedArray.to_numpy` for chunked arrays with only one 
chunk.
   
   Your contribution would be more than welcome, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] ChunkedArray.to_numpy() sets the OWNDATA and WRITEABLE flags to False [arrow]

Reply via email to