[ 
https://issues.apache.org/jira/browse/ARROW-5912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17193630#comment-17193630
 ] 

Joris Van den Bossche commented on ARROW-5912:
----------------------------------------------

In the meantime, this now results in a tz-aware pyarrow array. It only takes 
the first encountered timezone. I think ideally in case of multiple timezones, 
it would use UTC instead, but at least the result is already more correct now 
(the actual values stored under the hood are correctly normalized to UTC).

> [Python] conversion from datetime objects with mixed timezones should 
> normalize to UTC
> --------------------------------------------------------------------------------------
>
>                 Key: ARROW-5912
>                 URL: https://issues.apache.org/jira/browse/ARROW-5912
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>              Labels: beginner
>
> Currently, when having objects with mixed timezones, they are each separately 
> interpreted as their local time:
> {code:python}
> >>> ts_pd_paris = pd.Timestamp("1970-01-01 01:00", tz="Europe/Paris")
> >>> ts_pd_paris    
> Timestamp('1970-01-01 01:00:00+0100', tz='Europe/Paris')
> >>> ts_pd_helsinki = pd.Timestamp("1970-01-01 02:00", tz="Europe/Helsinki")
> >>> ts_pd_helsinki
> Timestamp('1970-01-01 02:00:00+0200', tz='Europe/Helsinki')
> >>> a = pa.array([ts_pd_paris, ts_pd_helsinki])                               
> >>>                                                                           
> >>>      
> >>> a
> <pyarrow.lib.TimestampArray object at 0x7f7856c4a360>
> [
>   1970-01-01 01:00:00.000000,
>   1970-01-01 02:00:00.000000
> ]
> >>> a.type
> TimestampType(timestamp[us])
> {code}
> So both times are actually about the same moment in time (the same value in 
> UTC; in pandas their stored {{value}} is also the same), but once converted 
> to pyarrow, they are both tz-naive but no longer the same time. That seems 
> rather unexpected and a source for bugs.
> I think a better option would be to normalize to UTC, and result in a 
> tz-aware TimestampArray with UTC as timezone. 
> That is also the behaviour of pandas if you force the conversion to result in 
> datetimes (by default pandas will keep them as object array preserving the 
> different timezones).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to