Brandon B. Miller created ARROW-9772:
----------------------------------------
Summary: Optionally allow for to_pandas to return writeable pandas
objects
Key: ARROW-9772
URL: https://issues.apache.org/jira/browse/ARROW-9772
Project: Apache Arrow
Issue Type: New Feature
Components: Python
Affects Versions: 0.17.1
Reporter: Brandon B. Miller
In cuDF, I'd like to leverage pyarrow to facilitate the conversion from cuDF
series and dataframe objects into the equivalent pandas objects. Concretely I'd
like something like this to work:
`pandas_object = cudf_object.to_arrow().to_pandas()`.
This allows us to stay consistent with the way the rest of the pyarrow
ecosystem handles nulls, dtype conversions and the like without having to
reinvent the wheel. However I noticed that in some zero copy scenarios, pyarrow
doesn't seem to fully release the underlying buffers when converting
`to_pandas()`. The resulting objects are immutable and if one tries to mutate
the data they will encounter
`ValueError: assignment destination is read-only`
This creates a slightly strange situation where a user might encounter issues
that subtly stem from the fact that arrow was used to construct the offending
pandas object. It would be nice to be able to toggle this behavior using a
kwarg or something similar. I suspect this could come up in other situations
where libraries want to convert back and forth between equivalent python
objects through arrow and expect the final object they get to behave as if it
were constructed via other means.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)