AlenkaF commented on issue #38789:
URL: https://github.com/apache/arrow/issues/38789#issuecomment-1931766655
Thank you for opening an issue and sorry for such a late reply!
I agree, the documentation for `ChunkedArray.to_numpy` is a bit misleading.
But the behaviour should actually be correct!
The example you are using in the issue description is a special case as the
chunked array consists only of one chunk and can therefore be converted to
numpy zero copy:
```python
In [1]: import pyarrow as pa
In [2]: arr = pa.chunked_array([[1.1]])
In [3]: arr.to_numpy(zero_copy_only=False).base.base
Out[3]:
<pyarrow.lib.ChunkedArray object at 0x105329170>
[
[
1.1
]
]
In [4]: arr
Out[4]:
<pyarrow.lib.ChunkedArray object at 0x105329170>
[
[
1.1
]
]
```
which means the code in PyArrow that checks for `zero_copy_only=False`
should be changed to allow zero copy in case of a single chunk together with
the docs for `zero_copy_only` attribute:
https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/table.pxi#L498-L501
https://github.com/apache/arrow/blob/a6e577d031d20a1a7d3dd01536b9a77db5d1bff8/python/pyarrow/table.pxi#L481-L485
But the flags `OWNDATA : False` and ` WRITEABLE : False` are correct.
Now if the chunked array has more than one chunk, the `WRITEABLE` flag is
different but `OWNDATA` is still false as Arrow is holding on to the memory of
the Buffer created with the copy:
```python
In [5]: arr = pa.chunked_array([[1.1], [2.2]])
In [6]: arr.to_numpy(zero_copy_only=False).base.base
Out[6]: <capsule object "arrow::Buffer" at 0x10a017b10>
```
To sum up, what I think we should do is:
- Change the docstrings for
https://arrow.apache.org/docs/python/generated/pyarrow.ChunkedArray.html#pyarrow.ChunkedArray.to_numpy
to something like "Return a NumPy copy of this array except in case of a
single chunk when it follows the same behaviour as `pyarrow.Array.to_numpy`." .
The "experimental" should also be removed =)
- Change the docstring for `zero_copy_only` parameter and change the check
for this parameter in `ChunkedArray.to_numpy` for chunked arrays with only one
chunk.
Your contribution would be more than welcome, thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]