[
https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512034#comment-17512034
]
Rok Mihevc commented on ARROW-16022:
------------------------------------
Thanks for reporting this [~krcrouse], it's good to know this is needed.
There's an [open PR|https://github.com/apache/arrow/pull/12528] that will
change the way this is handled. Would this solve your issue?
> If they must fail, it should be when the pyarrow.Timestamp is created.
I'm not sure we want to validate at creation time by default as it would add
lots of overhead. We typically create timestamp arrays by assigning a timezone
to UTC timestamps. This means all timestamps should exist in local time (I
think), but some will be ambivalent. We could add an is_ambivalent or
ambivalient_to_null or something like that.
> floor_temporal / ceil_temporal throws exception for existing timestamps if
> ambiguous/existing
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-16022
> URL: https://issues.apache.org/jira/browse/ARROW-16022
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Reporter: Kevin Crouse
> Priority: Major
>
> Running pyarrow.compute.floor_temporal for timestamps that exist will throw
> exceptions if the times are ambiguous during the daylight savings time
> transitions.
> As the *_temporal functions do not fundamentally change the times, it does
> not make sense that they would fail due to a timezone issue. If they must
> fail, it should be when the pyarrow.Timestamp is created.
>
>
> {code:java}
> import pyarrow
> import pyarrow.compute as pc
> import datetime
> import pytz
> t = pyarrow.timestamp('s', tz='America/New_York')
> dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo =
> pytz.timezone('America/New_York'))
> # if a timestamp must be invalid, this could fail
> za = pyarrow.array([dt], t)
> # raises an exception, even though this is conceptually an identity function
> here
> pc.floor_temporal(za, unit = 'second') {code}
>
> And this actually works just fine (continued from above)
> {code:java}
> pc.cast(
> pc.floor_temporal(
> pc.cast(za, pyarrow.timestamp('s', 'UTC')),
> unit='second'),
> pyarrow.timestamp('s','America/New_York')
> )
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)