0x26res opened a new issue, #37009:
URL: https://github.com/apache/arrow/issues/37009
### Describe the bug, including details regarding any error messages,
version, and platform.
I'm trying to convert arrow data of type `pa.timestamp("ns", "UTC")` to
pandas, using `pd.ArrowDType`
It works for columns within a table:
```python
df = pa.table(
{"col1": pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))}
).to_pandas(types_mapper=pd.ArrowDtype)
assert df["col1"].dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
```
But it doesn't work for single arrays:
```python
pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
types_mapper=pd.ArrowDtype
)
```
```
Traceback (most recent call last):
pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
File "pyarrow/array.pxi", line 837, in
pyarrow.lib._PandasConvertible.to_pandas
File "pyarrow/array.pxi", line 1446, in pyarrow.lib.Array._to_pandas
File "pyarrow/array.pxi", line 1679, in pyarrow.lib._array_like_to_pandas
File "./venv/lib/python3.10/site-packages/pyarrow/pandas_compat.py", line
1264, in make_tz_aware
series = (series.dt.tz_localize('utc')
File "./venv/lib/python3.10/site-packages/pandas/core/accessor.py", line
112, in f
return self._delegate_method(name, *args, **kwargs)
File
"./venv/lib/python3.10/site-packages/pandas/core/indexes/accessors.py", line
199, in _delegate_method
result = getattr(self._parent.array, f"_dt_{name}")(*args, **kwargs)
File
"./venv/lib/python3.10/site-packages/pandas/core/arrays/arrow/array.py", line
2203, in _dt_tz_localize
result = pc.assume_timezone(
File "./venv/lib/python3.10/site-packages/pyarrow/compute.py", line 259,
in wrapper
return func.call(args, options, memory_pool)
File "pyarrow/_compute.pyx", line 367, in pyarrow._compute.Function.call
File "pyarrow/error.pxi", line 144, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: Timestamps already have a timezone: 'UTC'. Cannot
localize to 'utc'.
```
I found a work around for now:
```
series = (
pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))
.cast(pa.timestamp("ns"))
.to_pandas(types_mapper=pd.ArrowDtype)
.dt.tz_localize("UTC")
)
assert series.dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
```
Full example:
```python
import pyarrow as pa
import pandas as pd
import pytest
df = pa.table(
{"col1": pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))}
).to_pandas(types_mapper=pd.ArrowDtype)
assert df["col1"].dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
pa.array([pd.Timestamp.now()], pa.timestamp("ns")).to_pandas(
types_mapper=pd.ArrowDtype
)
with pytest.raises(
pa.ArrowInvalid,
match=r"Timestamps already have a timezone: 'UTC'. Cannot localize
to 'utc'.",
):
pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC")).to_pandas(
types_mapper=pd.ArrowDtype
)
series = (
pa.array([pd.Timestamp.utcnow()], pa.timestamp("ns", "UTC"))
.cast(pa.timestamp("ns"))
.to_pandas(types_mapper=pd.ArrowDtype)
.dt.tz_localize("UTC")
)
assert series.dtype == pd.ArrowDtype(pa.timestamp("ns", "UTC"))
```
This could be related to other issues:
- https://github.com/apache/arrow/issues/35633
- https://github.com/aws/aws-sdk-pandas/issues/2410
### Component(s)
Python
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]