Re: [I] [Python] Non zero-copy of pa.table.to_pandas() for simple case [arrow]

via GitHub Thu, 09 Nov 2023 07:16:47 -0800


sergun commented on issue #38644:
URL: https://github.com/apache/arrow/issues/38644#issuecomment-1804027022


   > The `zero_copy_only` keyword hasn't really been made to work with 
`types_mapper`, so yes this can be seen as a wrong error message. However, it's 
also difficult to properly implement that, because pyarrow doesn't know if the 
type you pass to `types_mapper` will be able to convert zero-copy or not. (but 
the message about "multi-column DataFrame block" is certainly confusing)
   > 
   > Now, is there a reason you want to use this keyword? If you use 
`types_mapper=pd.ArrowDtype`, you have the guarantee that the conversions are 
zero-copy.
   
   Thanks a lot!
   
   My original logical chain was as follows:
   I called 
   ```
   df = table.to_pandas(/* zero_copy_only=False*/)
   ```
   checked `df.dtypes` and saw something like:
   ```
   print(df.dtypes)
   a    int64
   dtype: object
   ```
   
   I was weired since I expected that pa.table.to_pandas() with default params 
does the best to do not convert data to pandas 2.x types so I expected:
   ```
   print(df.dtypes)
   a    int64[arrow]
   dtype: object
   ```
   
   Next I decided to force PyArrow to keep types by types_mapper=pd.ArrowDtype 
and saw:
   ```
   df = table.to_pandas(types_mapper=pd.ArrowDtype/*, zero_copy_only=False*/)
   print(df.dtypes)
   a    int64[arrow]
   dtype: object
   ```
   
   At the end I decided to double check that zero-copy is used by:
   ```
   df = table.to_pandas(types_mapper=pd.ArrowDtype, zero_copy_only=True)
   ```
   and got unexpected exception.
   
   That is my user story :-)
   
   Probably it is non-intuitive behaviour of `pa.table.to_pandas(...)`. Or I 
skiped something in the documentation.
   And you are right "multi-column DataFrame block" message is also confusing.
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] [Python] Non zero-copy of pa.table.to_pandas() for simple case [arrow]

Reply via email to