jorisvandenbossche commented on issue #13316:
URL: https://github.com/apache/arrow/issues/13316#issuecomment-1147141293

   @zhangyingmath that is expected, although I see that this is not properly 
documented in the docstring 
(https://arrow.apache.org/docs/dev/python/generated/pyarrow.Table.html#pyarrow.Table.to_pandas).
 It is mentioned in the user guide at 
https://arrow.apache.org/docs/dev/python/pandas.html#memory-usage-and-zero-copy 
(emphasis mine):
   
   > `split_blocks=True`, when enabled `Table.to_pandas` produces one internal 
DataFrame “block” for each column, skipping the “consolidation” step. Note that 
many pandas operations will trigger consolidation anyway, but the peak memory 
use may be less than the worst case scenario of a full memory doubling. **As a 
result of this option, we are able to do zero copy conversions of columns in 
the same cases where we can do zero copy with Array and ChunkedArray.**
   
   Currently using `split_blocks=True` automatically also implicates that 
pyarrow will try to do the conversion zero-copy. And when it is zero-copy, the 
resulting numpy arrays are read-only, because in pyarrow the data is immutable. 
   If we want to mutate the data afterwards, I think the best option is to do 
make a copy of the resulting dataframe (although that defeats the purpose of 
using `split_blocks=True` to avoid copies).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to