[
https://issues.apache.org/jira/browse/ARROW-16022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17512393#comment-17512393
]
Rok Mihevc commented on ARROW-16022:
------------------------------------
Hey Kevin,
I agree with you on the first paragraph.
For the second - I assume you're referring to this case:
{code:python}
import pyarrow.compute as pc
import pyarrow as pa
arr = pa.array(pd.to_datetime(["2022-11-06 05:00:00"]), pa.timestamp("ms",
"America/New_York"))
pc.floor_temporal(arr, unit="second", multiple=1)
---------------------------------------------------------------------------
ArrowInvalid: Local time is ambiguous: 2022-11-06 01:00:00.000 is ambiguous.
It could be
2022-11-06 01:00:00.000 EDT == 2022-11-06 05:00:00.000 UTC or
2022-11-06 01:00:00.000 EST == 2022-11-06 06:00:00.000 UTC
{code}
(If you have another example please provide it, the more tests we have the more
likely this is to work correctly.)
This happens because arrow internally keeps time in UTC and converts to local
time to do the rounding there then stores the result back to UTC.
The autumn UTC->local conversion here fails because of ambiguity. I think in
such case we can simply fall back to rounding in UTC (taking the timezone
offset into account for e.g. +04:30 offsets) and the result might even be
correct. I'm not sure.
Either way - as a temporary workaround if you keep your timestamp array in UTC
you should not see any such issues.
> floor_temporal / ceil_temporal throws exception for existing timestamps if
> ambiguous/existing
> ---------------------------------------------------------------------------------------------
>
> Key: ARROW-16022
> URL: https://issues.apache.org/jira/browse/ARROW-16022
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Affects Versions: 7.0.0
> Reporter: Kevin Crouse
> Priority: Major
>
> Running pyarrow.compute.floor_temporal for timestamps that exist will throw
> exceptions if the times are ambiguous during the daylight savings time
> transitions.
> As the *_temporal functions do not fundamentally change the times, it does
> not make sense that they would fail due to a timezone issue. If they must
> fail, it should be when the pyarrow.Timestamp is created.
>
>
> {code:java}
> import pyarrow
> import pyarrow.compute as pc
> import datetime
> import pytz
> t = pyarrow.timestamp('s', tz='America/New_York')
> dt = datetime.datetime(2013, 11, 3, 1, 3, 14, tzinfo =
> pytz.timezone('America/New_York'))
> # if a timestamp must be invalid, this could fail
> za = pyarrow.array([dt], t)
> # raises an exception, even though this is conceptually an identity function
> here
> pc.floor_temporal(za, unit = 'second') {code}
>
> And this actually works just fine (continued from above)
> {code:java}
> pc.cast(
> pc.floor_temporal(
> pc.cast(za, pyarrow.timestamp('s', 'UTC')),
> unit='second'),
> pyarrow.timestamp('s','America/New_York')
> )
> {code}
>
>
>
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)