[ https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
lidavidm updated ARROW-5912: ---------------------------- Labels: beginner (was: ) > [Python] conversion from datetime objects with mixed timezones should > normalize to UTC > -------------------------------------------------------------------------------------- > > Key: ARROW-5912 > URL: https://issues.apache.org/jira/browse/ARROW-5912 > Project: Apache Arrow > Issue Type: Bug > Components: Python > Reporter: Joris Van den Bossche > Priority: Major > Labels: beginner > Fix For: 1.0.0 > > > Currently, when having objects with mixed timezones, they are each separately > interpreted as their local time: > {code:python} > >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris") > >>> ts_pd_paris > Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris') > >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki") > >>> ts_pd_helsinki > Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki') > >>> a = pa.array([ts_pd_paris, ts_pd_helsinki]) > >>> > >>> > >>> a > <pyarrow.lib.TimestampArray object at 0x7f7856c4a360> > [ > 1970-01-01 01:00:00.000000, > 1970-01-01 02:00:00.000000 > ] > >>> a.type > TimestampType(timestamp[us]) > {code} > So both times are actually about the same moment in time (the same value in > UTC; in pandas their stored {{value}} is also the same), but once converted > to pyarrow, they are both tz-naive but no longer the same time. That seems > rather unexpected and a source for bugs. > I think a better option would be to normalize to UTC, and result in a > tz-aware TimestampArray with UTC as timezone. > That is also the behaviour of pandas if you force the conversion to result in > datetimes (by default pandas will keep them as object array preserving the > different timezones). -- This message was sent by Atlassian JIRA (v7.6.14#76016)