[ 
https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

lidavidm updated ARROW-5912:
----------------------------
    Labels: beginner  (was: )

> [Python] conversion from datetime objects with mixed timezones should 
> normalize to UTC
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-5912
>                 URL: https://issues.apache.org/jira/browse/ARROW-5912
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: beginner
>             Fix For: 1.0.0
>
>
> Currently, when having objects with mixed timezones, they are each separately 
> interpreted as their local time:
> {code:python}
> >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris")
> >>> ts_pd_paris    
> Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris')
> >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki")
> >>> ts_pd_helsinki
> Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki')
> >>> a = pa.array([ts_pd_paris, ts_pd_helsinki])                               
> >>>                                                                           
> >>>      
> >>> a
> <pyarrow.lib.TimestampArray object at 0x7f7856c4a360>
> [
>   1970-01-01 01:00:00.000000,
>   1970-01-01 02:00:00.000000
> ]
> >>> a.type
> TimestampType(timestamp[us])
> {code}
> So both times are actually about the same moment in time (the same value in 
> UTC; in pandas their stored {{value}} is also the same), but once converted 
> to pyarrow, they are both tz-naive but no longer the same time. That seems 
> rather unexpected and a source for bugs.
> I think a better option would be to normalize to UTC, and result in a 
> tz-aware TimestampArray with UTC as timezone. 
> That is also the behaviour of pandas if you force the conversion to result in 
> datetimes (by default pandas will keep them as object array preserving the 
> different timezones).



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to