sergun commented on issue #38644: URL: https://github.com/apache/arrow/issues/38644#issuecomment-1804027022
> The `zero_copy_only` keyword hasn't really been made to work with `types_mapper`, so yes this can be seen as a wrong error message. However, it's also difficult to properly implement that, because pyarrow doesn't know if the type you pass to `types_mapper` will be able to convert zero-copy or not. (but the message about "multi-column DataFrame block" is certainly confusing) > > Now, is there a reason you want to use this keyword? If you use `types_mapper=pd.ArrowDtype`, you have the guarantee that the conversions are zero-copy. Thanks a lot! My original logical chain was as follows: I called ``` df = table.to_pandas(/* zero_copy_only=False*/) ``` checked `df.dtypes` and saw something like: ``` print(df.dtypes) a int64 dtype: object ``` I was weired since I expected that pa.table.to_pandas() with default params does the best to do not convert data to pandas 2.x types so I expected: ``` print(df.dtypes) a int64[arrow] dtype: object ``` Next I decided to force PyArrow to keep types by types_mapper=pd.ArrowDtype and saw: ``` df = table.to_pandas(types_mapper=pd.ArrowDtype/*, zero_copy_only=False*/) print(df.dtypes) a int64[arrow] dtype: object ``` At the end I decided to double check that zero-copy is used by: ``` df = table.to_pandas(types_mapper=pd.ArrowDtype, zero_copy_only=True) ``` and got unexpected exception. That is my user story :-) Probably it is non-intuitive behaviour of `pa.table.to_pandas(...)`. Or I skiped something in the documentation. And you are right "multi-column DataFrame block" message is also confusing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
