Lucas Pickup created ARROW-1435:
-----------------------------------
Summary: PyArrow not propagating timezone information from Parquet
to Pyhon
Key: ARROW-1435
URL: https://issues.apache.org/jira/browse/ARROW-1435
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 0.6.0
Reporter: Lucas Pickup
PyArrow reads timezone metadata for Timestamp values from Parquet. This
information isn't propagated through to the resulting python datetime object
though.
{noformat}
λ python
Python 3.5.2 |Continuum Analytics, Inc.| (default, Jul 5 2016, 11:41:13) [MSC
v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyarrow as pa
>>> import pyarrow.parquet as pq
>>> import pytz
>>> import pandas
>>> from datetime import datetime
>>>
>>> d1 = datetime.strptime('2015-07-05 23:50:00', '%Y-%m-%d %H:%M:%S')
>>> d1
datetime.datetime(2015, 7, 5, 23, 50)
>>> aware = pytz.utc.localize(d1)
>>> aware
datetime.datetime(2015, 7, 5, 23, 50, tzinfo=<UTC>)
>>>
>>> df = pandas.DataFrame()
>>> df['DateNaive'] = [d1]
>>> df['DateAware'] = [aware]
>>> df
DateNaive DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00+00:00
>>>
>>> table = pa.Table.from_pandas(df)
>>> table
pyarrow.Table
DateNaive: timestamp[ns]
DateAware: timestamp[ns, tz=UTC]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive",
"pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null},
{"name": "DateAware", "pandas_type": "datetimetz", "numpy_type":
"datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns":
["__index_level_0__"]}
>>>
>>> pq.write_table(table, "E:\\pyarrowDates.parquet")
>>>
>>> pyarrowTable = pq.read_table("E:\\pyarrowDates.parquet")
>>> pyarrowTable
pyarrow.Table
DateNaive: timestamp[us]
DateAware: timestamp[us]
__index_level_0__: int64
-- metadata --
pandas: {"pandas_version": "0.20.3", "columns": [{"name": "DateNaive",
"pandas_type": "datetime", "numpy_type": "datetime64[ns]", "metadata": null},
{"name": "DateAware", "pandas_type": "datetimetz", "numpy_type":
"datetime64[ns, UTC]", "metadata": {"timezone": "UTC"}}], "index_columns":
["__index_level_0__"]}
>>>
>>> pyarrowDF = pyarrowTable.to_pandas()
>>> pyarrowDF
DateNaive DateAware
0 2015-07-05 23:50:00 2015-07-05 23:50:00
{noformat}
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)