0x26res opened a new issue, #37009:
URL: https://github.com/apache/arrow/issues/37009

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm trying to convert arrow data of type `pa.timestamp("ns", "UTC")` to 
pandas, using `pd.ArrowDType`
   
   It works for columns within a table:
   
   ```python
   df = pa.table(
       {"col1": pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))}
   ).to_pandas(types_mapper=pd.ArrowDtype)
   assert df["col1"].dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
   ```
   
   But it doesn't work for single arrays:
   
   ```python
   pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
           types_mapper=pd.ArrowDtype
   )
   ```
   
   ```
   Traceback (most recent call last):
       pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
     File "pyarrow/array.pxi", line 837, in 
pyarrow.lib._PandasConvertible.to_pandas
     File "pyarrow/array.pxi", line 1446, in pyarrow.lib.Array._to_pandas
     File "pyarrow/array.pxi", line 1679, in pyarrow.lib._array_like_to_pandas
     File "./venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line 
1264, in make_tz_aware
       series = (series.dt.tz_localize('utc')
     File "./venv/lib/python3.10/site-packages/pandas/core/accessor.py", line 
112, in f
       return self._delegate_method(name, *args, **kwargs)
     File 
"./venv/lib/python3.10/site-packages/pandas/core/indexes/accessors.py", line 
199, in _delegate_method
       result = getattr(self._parent.array, f"_dt_{name}")(*args, **kwargs)
     File 
"./venv/lib/python3.10/site-packages/pandas/core/arrays/arrow/array.py", line 
2203, in _dt_tz_localize
       result = pc.assume_timezone(
     File "./venv/lib/python3.10/site-packages/pyarrow/compute.py", line 259, 
in wrapper
       return func.call(args, options, memory_pool)
     File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
     File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
   pyarrow.lib.ArrowInvalid: Timestamps already have a timezone: 'UTC'. Cannot 
localize to 'utc'.
   ```
   
   I found a work around for now:
   
   ```
   series = (
       pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))
       .cast(pa.timestamp("ns"))
       .to_pandas(types_mapper=pd.ArrowDtype)
       .dt.tz_localize("UTC")
   )
   assert series.dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
   ```
   
   
   Full example:
   
   ```python
   import pyarrow as pa
   import pandas as pd
   import pytest
   
   df = pa.table(
       {"col1": pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))}
   ).to_pandas(types_mapper=pd.ArrowDtype)
   assert df["col1"].dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
   
   pa.array([pd.Timestamp.now()], pa.timestamp("ns")).to_pandas(
       types_mapper=pd.ArrowDtype
   )
   
   with pytest.raises(
           pa.ArrowInvalid,
           match=r"Timestamps already have a timezone: 'UTC'. Cannot localize 
to 'utc'.",
   ):
       pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
           types_mapper=pd.ArrowDtype
       )
   
   series = (
       pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))
       .cast(pa.timestamp("ns"))
       .to_pandas(types_mapper=pd.ArrowDtype)
       .dt.tz_localize("UTC")
   )
   assert series.dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
   ```
   This could be related to other issues:
   - https://github.com/apache/arrow/issues/35633
   - https://github.com/aws/aws-sdk-pandas/issues/2410
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to