Diego Argueta created ARROW-3703: ------------------------------------ Summary: [Python] DataFrame.to_parquet crashes if datetime column has time zones Key: ARROW-3703 URL: https://issues.apache.org/jira/browse/ARROW-3703 Project: Apache Arrow Issue Type: Bug Components: Python Affects Versions: 0.11.1 Environment: pandas 0.23.4 pyarrow 0.11.1 Python 3.5 - 3.7 MacOS High Sierra (10.13.6) Reporter: Diego Argueta
On CPython 3.5.6, 3.6.6, and 3.7.0, creating a Pandas DataFrame with a {{datetime.datetime}} object serializes to Parquet just fine, but crashes with an {{AttributeError}} if you try to use the built-in {{timezone}} objects. To reproduce: {code:java} import datetime as dt import pandas as pd df = pd.DataFrame({'foo': [dt.datetime(2018, 1, 1, 1, 23, 45, tzinfo=dt.timezone.utc)]}) df.to_parquet('data.parq') {code} The following exception results: {noformat} Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/core/frame.py", line 1945, in to_parquet compression=compression, **kwargs) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 257, in to_parquet return impl.write(df, path, compression=compression, **kwargs) File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pandas/io/parquet.py", line 118, in write table = self.api.Table.from_pandas(df) File "pyarrow/table.pxi", line 1217, in pyarrow.lib.Table.from_pandas File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 381, in dataframe_to_arrays convert_types)] File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 380, in <listcomp> for c, t in zip(columns_to_convert, File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 370, in convert_column return pa.array(col, type=ty, from_pandas=True, safe=safe) File "pyarrow/array.pxi", line 167, in pyarrow.lib.array File "/Users/tux/.pyenv/versions/3.7.0/lib/python3.7/site-packages/pyarrow/pandas_compat.py", line 409, in get_datetimetz_type type_ = pa.timestamp(unit, tz) File "pyarrow/types.pxi", line 1038, in pyarrow.lib.timestamp File "pyarrow/types.pxi", line 955, in pyarrow.lib.tzinfo_to_string AttributeError: 'datetime.timezone' object has no attribute 'zone' 'datetime.timezone' object has no attribute 'zone' {noformat} This doesn't happen if you use {{pytz.UTC}} as the timezone object. -- This message was sent by Atlassian JIRA (v7.6.3#76005)