appletreeisyellow commented on issue #10602:
URL: https://github.com/apache/datafusion/issues/10602#issuecomment-2128168367
To make `date_bin` timezone aware, there are some edge cases we need to
consider when design it:
1. **Daylight Saving Time (DST) Transitions:**
- Spring Forward: When the clocks move forward, there is a "missing"
hour.
- Fall Back: When the clocks move backward, there is an "extra" hour.
For example, in US central time zone, when DST ends at 2:00 AM, the clocks are
set back to 1:00 AM. This means that there are two 1:20 AM in a day. If a user
do a `date_bin` with 10 min interval, how to handle the returned data?
Aggregate them? Return two sets of data?
2. **Crossing Midnight Boundaries:**
- Ensure that timestamps are correctly binned at the right local day and
hour boundaries, especially when converting from UTC to a local timezone. This
edge case is what this issue is trying to solve.
3. **Timezone Offsets:**
- Different timezones have different offsets from UTC, and these can
change over time (e.g., due to DST).
- For certain timezones, ensure the offset works correctly at the 30-min
mark when DST happens. e.g.:
- Lord Howe Island Time Zone (LHST) in Australia shifts time by 30
min during DST
- Nepal (NPT) shifts by 45 min
- Chatham Islands, New Zealand (CHAST) shifts by 12 hr and 45 min.
- Ensure the function correctly applies the current offset for the given
timestamp, considering historical changes in timezone rules. e.g.:
- America/Sao_Paulo stoped doing DST in 2019
- Russia set the clocks ahead permanently in 2011, then was reversed
in 2014
- Iceland, Turkey, Egypt, India, and China
4. **Leap Years and Leap Seconds:**
- Leap Years: February 29th is handled correctly in leap years.
- Leap Seconds: this is less common. e.g. the most recent one was on
Dec. 31, 2016, a leap second was introduced at 23:59:60 UTC
5. **Handling Null and Invalid Timestamps:**
- Ensure that null or invalid timestamps are handled gracefully, either
by ignoring them or providing a default bin. e.g. `'2021-03-28T02:30:00' AT
TIME ZONE 'Europe/Brussels'` does not exist
6. **Time Precision:**
- Ensure that the function handles different precisions of timestamps
correctly (e.g., seconds, milliseconds, nanoseconds).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]