[ 
https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512449#comment-17512449
 ] 

Rok Mihevc commented on ARROW-16022:
------------------------------------

If you want to switch to UTC from local time and avoid pandas you can use 
assume_timezone (similar to pandas tz_localize):
{code:python}
import pyarrow.compute as pc
import pyarrow as pa

arr = pa.array(["2022-11-06 01:00:00"]).cast(pa.timestamp("ms"))
arr_zoned = pc.assume_timezone(
    arr,
    "America/New_York",
    nonexistent="earliest",
    ambiguous="earliest"
)
pc.floor_temporal(arr_zoned.cast(pa.timestamp("ms", "UTC")), unit="second", 
multiple=1)
pc.floor_temporal(arr_zoned, unit="second", multiple=1)

---------------------------------------------------------------------------

ArrowInvalid                              Traceback (most recent call last)
Input In [218], in <module>
...
     11 pc.floor_temporal(arr_zoned.cast(pa.timestamp("ms", "UTC")), 
unit="second", multiple=1)
---> 12 pc.floor_temporal(arr_zoned, unit="second", multiple=1)
...
ArrowInvalid: Local time is ambiguous: 2022-11-06 01:00:00.000 is ambiguous.  
It could be
2022-11-06 01:00:00.000 EDT == 2022-11-06 05:00:00.000 UTC or
2022-11-06 01:00:00.000 EST == 2022-11-06 06:00:00.000 UTC
{code}
Notice the UTC case works and local time does not.

 

I can try fixing this ambiguity issue by falling back to rounding in UTC. I 
would just wait for ARROW-15251 to merge to minimise complexity.

> floor_temporal / ceil_temporal throws exception for existing timestamps if 
> ambiguous/existing
> ---------------------------------------------------------------------------------------------
>
>                 Key: ARROW-16022
>                 URL: https://issues.apache.org/jira/browse/ARROW-16022
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 7.0.0
>            Reporter: Kevin Crouse
>            Priority: Major
>
> Running pyarrow.compute.floor_temporal for timestamps that exist will throw 
> exceptions if the times are ambiguous during the daylight savings time 
> transitions. 
> As the *_temporal functions do not fundamentally change the times, it does 
> not make sense that they would fail due to a timezone issue. If they must 
> fail, it should be when the pyarrow.Timestamp is created.
>  
>  
> {code:java}
> import pyarrow
> import pyarrow.compute as pc
> import datetime
> import pytz
> t = pyarrow.timestamp('s', tz='America/New_York')
> dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo = 
> pytz.timezone('America/New_York'))
> # if a timestamp must be invalid, this could fail
> za = pyarrow.array([dt], t) 
> # raises an exception, even though this is conceptually an identity function 
> here
> pc.floor_temporal(za, unit = 'second') {code}
>  
> And this actually works just fine (continued from above)
> {code:java}
> pc.cast(    
>     pc.floor_temporal(        
>         pc.cast(za, pyarrow.timestamp('s', 'UTC')),         
>     unit='second'),     
>     pyarrow.timestamp('s','America/New_York')
> )
>  {code}
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to