AlenkaF commented on issue #49002: URL: https://github.com/apache/arrow/issues/49002#issuecomment-3846435369
I needed to do a bit of research to understand this issue a bit more. When converting pyarrow arrays, that are not extension types or do not have a `type_mapper` argument specified in `to_pandas()`, the PyArrow codepath will go through the C++ layer transforming the pyarrow array into a numpy one. From there pandas series is constructed with the use of the pandas api. What one can do is to use `types_mapper=pd.ArrowDtype` and then the dunder `__from_arrow__` method takes precedence over the C++ conversion: https://github.com/apache/arrow/blob/d2315fe00345b87a28f8fb268a1017934d4bf58a/python/pyarrow/array.pxi#L2280-L2284 So this then works as I would expect: ```python >>> import pyarrow as pa >>> import pandas as pd >>> pd.__version__ '3.0.0' >>> pa.__version__ '24.0.0.dev46+g1c1d25f29' >>> pa.array(["a"], type="str").to_pandas(types_mapper=pd.ArrowDtype) 0 a dtype: string[pyarrow] >>> pa.array([None], type="str").to_pandas(types_mapper=pd.ArrowDtype) 0 <NA> dtype: string[pyarrow] ``` Now, the example that was reported in this issue might probably not be able to work as pandas has to infer type information when the C++ codepath is taken? We provide a numpy object to the pandas series constructor and then the information about the data type has to be inferred by pandas, I think, as we default to an object in this case: https://github.com/apache/arrow/blob/d2315fe00345b87a28f8fb268a1017934d4bf58a/python/pyarrow/src/arrow/python/arrow_to_pandas.cc#L2133 and most probably that can not be done for an empty array? In any case the `types_mapper=pd.ArrowDtype` needs to be documented in our user guide. Also we might add a check in `_array_like_to_pandas` to default to the `pd.ArrowDtype` for string and similar types (not overwriting `types_mapper`)? cc @jorisvandenbossche @WillAyd -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
