[ 
https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17518652#comment-17518652
 ] 

Joris Van den Bossche commented on ARROW-16022:
-----------------------------------------------

Maybe something else to point out is that you should best be careful with how 
you use {{pytz}} (as some call it "broken"). Your initial example might not be 
doing what you expected:

{code:python}
t = pyarrow.timestamp('s', tz='America/New_York')
dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = 
pytz.timezone('America/New_York'))
za = pyarrow.array([dt], t)

>>> print(dt)
2013-11-03 01:03:14-04:56

>>> za
<pyarrow.lib.TimestampArray object at 0x7fa58ecf0fa0>
[
  2013-11-03 05:59:14
]
{code}

Note the strange "04:56" offset when printing (while we would expect either 
"04:00" or "05:00"), and the strange UTC value when converted to a pyarrow 
array (an hour of "05:59", instead of "05:03" or "06:03"). 

This is because the {{dt}} value was created "incorrectly" for how pytz works 
(note that your code above is working fine when using zoneinfo timezones). See 
https://bugs.launchpad.net/pytz/+bug/1746179 and 
https://blog.ganssle.io/articles/2018/03/pytz-fastest-footgun.html for a more 
detailed explanation about this.

The "correct" way to do this with the pytz library is (but this is a reason 
many people recommend to stop using pytz):

{code:python}
>>> dt = pytz.timezone('America/New_York').localize(datetime.datetime(2013, 11, 
>>> 3, 1, 3, 14))
>>> print(dt)
2013-11-03 01:03:14-05:00
>>> pa.array([dt])
<pyarrow.lib.TimestampArray object at 0x7fa58edb4340>
[
  2013-11-03 06:03:14.000000
]
{code}

> [C++] Temporal floor/ceil/round throws exception for timestamps ambiguous due 
> to DST
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-16022
>                 URL: https://issues.apache.org/jira/browse/ARROW-16022
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>            Reporter: Kevin Crouse
>            Priority: Major
>
> Running pyarrow.compute.floor_temporal for timestamps that exist will throw 
> exceptions if the times are ambiguous during the daylight savings time 
> transitions. 
> As the *_temporal functions do not fundamentally change the times, it does 
> not make sense that they would fail due to a timezone issue. If they must 
> fail, it should be when the pyarrow.Timestamp is created.
>  
>  
> {code:java}
> import pyarrow
> import pyarrow.compute as pc
> import datetime
> import pytz
> t = pyarrow.timestamp('s', tz='America/New_York')
> dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = 
> pytz.timezone('America/New_York'))
> # if a timestamp must be invalid, this could fail
> za = pyarrow.array([dt], t) 
> # raises an exception, even though this is conceptually an identity function 
> here
> pc.floor_temporal(za, unit = 'second') {code}
>  
> And this actually works just fine (continued from above)
> {code:java}
> pc.cast(    
>     pc.floor_temporal(        
>         pc.cast(za, pyarrow.timestamp('s', 'UTC')),         
>     unit='second'),     
>     pyarrow.timestamp('s','America/New_York')
> )
>  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to