jorisvandenbossche commented on pull request #12528:
URL: https://github.com/apache/arrow/pull/12528#issuecomment-1062781499


   > While ceiling to 2h30min sounds exotic, this is a valid use case.
   
   Rok already mentioned it, but while it's true that non-existent times from 
rounding are a bit exotic, the ambiguous is certainly not. 
   
   To give a concrete example, assume the local time "2021-10-31 02:25:00" in 
Europe (during a DST switch) and rounding that to the hour:
   
   ```
   >>> arr = pa.array([pd.Timestamp("2021-10-31 02:25:00")])
   >>> arr = pc.assume_timezone(arr, "Europe/Brussels", ambiguous="earliest")
   >>> arr
   <pyarrow.lib.TimestampArray object at 0x7f00c1e04760>
   [
     2021-10-31 00:25:00.000000
   ]
   
   >>> pc.round_temporal(arr, 1, "hour")
   ...
   ArrowInvalid: Local time is ambiguous: 2021-10-31 02:00:00.000000 is 
ambiguous.  It could be
   2021-10-31 02:00:00.000000 CEST == 2021-10-31 00:00:00.000000 UTC or
   2021-10-31 02:00:00.000000 CET == 2021-10-31 01:00:00.000000 UTC
   ```
   
   But indeed, also in this case we can know that "00:00::00 UTC" is closer to 
the original timestamp than "01:00:00 UTC" (since the original timestamp in UTC 
was "00:25:00 UTC"). 
   
   That adds some more logic to this kernel, but this would actually make those 
round kernels more useful! 
   (for example, if you have a regular timeseries (say of minute interval) and 
you round it to the hour, you could never pick an ambiguous="latest"/"earliest" 
option that is correct for all values in your timeseries)
   
   > Or are you proposing to not raise? 
   
   If there is no ambiguity left (eg as in the example above), I think we 
should not raise by default. 
   
   But it might be that for some cases it's still better to raise by default. 
For example in the case of "non-existent" times, we are actually changing the 
resulting timestamp, and thus that also means it will not necessarily "follow" 
the rounding `multiple` and `unit`. I think in such cases, it might still be 
better to raise by default?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to