jorisvandenbossche commented on pull request #12528:
URL: https://github.com/apache/arrow/pull/12528#issuecomment-1062781499
> While ceiling to 2h30min sounds exotic, this is a valid use case.
Rok already mentioned it, but while it's true that non-existent times from
rounding are a bit exotic, the ambiguous is certainly not.
To give a concrete example, assume the local time "2021-10-31 02:25:00" in
Europe (during a DST switch) and rounding that to the hour:
```
>>> arr = pa.array([pd.Timestamp("2021-10-31 02:25:00")])
>>> arr = pc.assume_timezone(arr, "Europe/Brussels", ambiguous="earliest")
>>> arr
<pyarrow.lib.TimestampArray object at 0x7f00c1e04760>
[
2021-10-31 00:25:00.000000
]
>>> pc.round_temporal(arr, 1, "hour")
...
ArrowInvalid: Local time is ambiguous: 2021-10-31 02:00:00.000000 is
ambiguous. It could be
2021-10-31 02:00:00.000000 CEST == 2021-10-31 00:00:00.000000 UTC or
2021-10-31 02:00:00.000000 CET == 2021-10-31 01:00:00.000000 UTC
```
But indeed, also in this case we can know that "00:00::00 UTC" is closer to
the original timestamp than "01:00:00 UTC" (since the original timestamp in UTC
was "00:25:00 UTC").
That adds some more logic to this kernel, but this would actually make those
round kernels more useful!
(for example, if you have a regular timeseries (say of minute interval) and
you round it to the hour, you could never pick an ambiguous="latest"/"earliest"
option that is correct for all values in your timeseries)
> Or are you proposing to not raise?
If there is no ambiguity left (eg as in the example above), I think we
should not raise by default.
But it might be that for some cases it's still better to raise by default.
For example in the case of "non-existent" times, we are actually changing the
resulting timestamp, and thus that also means it will not necessarily "follow"
the rounding `multiple` and `unit`. I think in such cases, it might still be
better to raise by default?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]