waitingkuo commented on issue #4106: URL: https://github.com/apache/arrow-datafusion/issues/4106#issuecomment-1304872305
@avantgardnerio thank you. I didn't aware this before. never live in the area that has the timezone switch. I did some research in Postgrseql and Chrono-tz MST timezone offset: -7 MDT timezone offset: -6 In 2022, MDT began from Sunday, 13 March, 02:00 (changed from -7 to -6) ended at Sunday, 6 November, 02:00 (changed from -6 to -7) ```bash willy=# set timezone to 'America/Denver'; SET ``` this is valid (right before the timezone shift): ```bash willy=# select to_timestamp('2022-03-13 01:00', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 01:00:00-07 (1 row) willy=# select to_timestamp('2022-03-13 01:59', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 01:59:00-07 (1 row) ``` begin from 2:00, it switches to MDT (-6) i think the logic behind for this hour is: parse as it's MST(-7), and then switch to MDT (-6), that's why we have an one hour shift here ```bash willy=# select to_timestamp('2022-03-13 02:00', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 03:00:00-06 (1 row) willy=# select to_timestamp('2022-03-13 02:30', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 03:30:00-06 (1 row) willy=# select to_timestamp('2022-03-13 02:59', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 03:59:00-06 (1 row) ``` and then the next hour it's parsed as MDT ```bash willy=# select to_timestamp('2022-03-13 03:00', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-03-13 03:00:00-06 (1 row) ``` In November 6th 2am MDT, time zone is switched back to MST (1am MST) let's begin with something unambiguous ``` willy=# select to_timestamp('2022-11-06 00:59', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-11-06 00:59:00-06 (1 row) ``` for the next 1 hour, it's ambiguous since both MST and MDT has 1 am. and this is what postgresql does, it uses MST even though 1am MDT is valid as well ```bash willy=# select to_timestamp('2022-11-06 01:00', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-11-06 01:00:00-07 (1 row) willy=# select to_timestamp('2022-11-06 01:59', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-11-06 01:59:00-07 (1 row) ``` after that, there's no ambiguity ```bash willy=# select to_timestamp('2022-11-06 02:00', 'YYYY-MM-DD HH24:MI'); to_timestamp ------------------------ 2022-11-06 02:00:00-07 (1 row) ``` conclusion for Postgrseql: when it's ambiguous or invalid, it's parsed as MST, and then switches to MDT if needed now let's see the behavior for `chrono-tz` ```rust let tz: Tz = "America/Denver".parse().unwrap(); let dt = tz.datetime_from_str("2022-03-13T01:59", "%Y-%m-%dT%H:%M"); println!("2022-03-13T01:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); let dt = tz.datetime_from_str("2022-03-13T02:00", "%Y-%m-%dT%H:%M"); println!("2022-03-13T02:00 -> {:?}", dt); let dt = tz.datetime_from_str("2022-03-13T02:59", "%Y-%m-%dT%H:%M"); println!("2022-03-13T02:59 -> {:?}", dt); let dt = tz.datetime_from_str("2022-03-13T03:00", "%Y-%m-%dT%H:%M"); println!("2022-03-13T03:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); ``` ```bash 2022-03-13T01:59 -> Ok(2022-03-13T01:59:00MST) / 2022-03-13T01:59:00-07:00 2022-03-13T02:00 -> Err(ParseError(Impossible)) 2022-03-13T02:59 -> Err(ParseError(Impossible)) 2022-03-13T03:00 -> Ok(2022-03-13T03:00:00MDT) / 2022-03-13T03:00:00-06:00 ``` ```rust let dt = tz.datetime_from_str("2022-11-06T00:59", "%Y-%m-%dT%H:%M"); println!("2022-11-06T00:59 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); let dt = tz.datetime_from_str("2022-11-06T01:00", "%Y-%m-%dT%H:%M"); println!("2022-11-06T01:00 -> {:?}", dt); let dt = tz.datetime_from_str("2022-11-06T01:59", "%Y-%m-%dT%H:%M"); println!("2022-11-06T01:59 -> {:?}", dt); let dt = tz.datetime_from_str("2022-11-06T02:00", "%Y-%m-%dT%H:%M"); println!("2022-11-06T02:00 -> {:?} / {}", dt, dt.unwrap().to_rfc3339()); ``` ```bash 2022-11-06T00:59 -> Ok(2022-11-06T00:59:00MDT) / 2022-11-06T00:59:00-06:00 2022-11-06T01:00 -> Err(ParseError(NotEnough)) 2022-11-06T01:59 -> Err(ParseError(NotEnough)) 2022-11-06T02:00 -> Ok(2022-11-06T02:00:00MST) / 2022-11-06T02:00:00-07:00 ``` I think `chrono-tz` did a good job here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org