[
https://issues.apache.org/jira/browse/ARROW-8066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17057949#comment-17057949
]
Markovtsev Vadim commented on ARROW-8066:
-----------------------------------------
Yep, converting mixed timezones to UTC would make much sense IMO.
Pandas will prefer datetime64 if the items are within the allowed range (year
1677 - year 2262) and the timezone is the same, otherwise, it will resort to
datetime.datetime.
> [Python] Specify behavior for converting tz-aware datetime.datetime objects
> to Arrow format
> -------------------------------------------------------------------------------------------
>
> Key: ARROW-8066
> URL: https://issues.apache.org/jira/browse/ARROW-8066
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 0.16.0
> Reporter: Markovtsev Vadim
> Priority: Major
>
> The original description is atÂ
> [https://github.com/pandas-dev/pandas/issues/32587]
> h3. Code Sample, a copy-pastable example if possible
> {code:python}
> import pandas as pd
> from datetime import datetime, timezone
> df = pd.DataFrame.from_records([
> (1, datetime.now().replace(tzinfo=timezone.utc)),
> (2, datetime.now().replace(tzinfo=timezone.min))],
> columns=["1", "2"])
> print(df["2"])
> print()
> df.to_feather("/tmp/1")
> df2 = pd.read_feather("/tmp/1")
> print(df2["2"])
> {code}
> This code will output:
> {noformat}
> 0 2020-03-10 18:13:49.405598+00:00
> 1 2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0 2020-03-10 18:13:49.405598
> 1 2020-03-10 18:13:49.405626
> Name: 2, dtype: datetime64[ns]
> {noformat}
> h3. Problem description
> The round-trip dtype changed from the correct `object` to incorrect
> `datetime64`. Thus the timezones were discarded in Arrow and the timestamps
> became invalid.
> h3. Expected Output
> (identical)
> {noformat}
> 0 2020-03-10 18:13:49.405598+00:00
> 1 2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> 0 2020-03-10 18:13:49.405598+00:00
> 1 2020-03-10 18:13:49.405626-23:59
> Name: 2, dtype: object
> {noformat}
> h3. Output of ``pd.show_versions()``
> {noformat}
> INSTALLED VERSIONS
> ------------------
> commit : None
> python : 3.7.5.final.0
> python-bits : 64
> OS : Linux
> OS-release : 5.3.0-40-generic
> machine : x86_64
> processor : x86_64
> byteorder : little
> LC_ALL : None
> LANG : en_US.UTF-8
> LOCALE : en_US.UTF-8
> pandas : 1.0.1
> numpy : 1.17.4
> pytz : 2019.2
> dateutil : 2.7.3
> pip : 19.3.1
> setuptools : 42.0.1
> Cython : 0.29.14
> pytest : 5.3.1
> hypothesis : None
> sphinx : None
> blosc : None
> feather : None
> xlsxwriter : None
> lxml.etree : 4.5.0
> html5lib : None
> pymysql : None
> psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
> jinja2 : 2.10.3
> IPython : 7.10.0
> pandas_datareader: None
> bs4 : 4.8.1
> bottleneck : None
> fastparquet : None
> gcsfs : None
> lxml.etree : 4.5.0
> matplotlib : 3.1.2
> numexpr : None
> odfpy : None
> openpyxl : None
> pandas_gbq : None
> pyarrow : 0.16.0
> pytables : None
> pytest : 5.3.1
> pyxlsb : None
> s3fs : None
> scipy : 1.2.1
> sqlalchemy : 1.3.12
> tables : None
> tabulate : None
> xarray : None
> xlrd : None
> xlwt : None
> xlsxwriter : None
> numba : None
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)