tustvold commented on issue #686: URL: https://github.com/apache/arrow-datafusion/issues/686#issuecomment-1465179078
Timezone aware timestamp parsing was recently added to arrow-rs - https://github.com/apache/arrow-rs/pull/3795 In particular it follow the approach laid out in the [arrow standard](https://github.com/apache/arrow/blob/main/format/Schema.fbs#L289) In particular > /// If a Timestamp column has a non-empty timezone, changing the timezone > /// to a different non-empty value is a metadata-only operation: > /// the timestamp values need not change as their point of reference remains > /// the same (the Unix epoch). > /// > /// However, if a Timestamp column has no timezone value, changing it to a > /// non-empty value requires to think about the desired semantics. > /// One possibility is to assume that the original timestamp values are > /// relative to the epoch of the timezone being set; timestamp values should > /// then adjusted to the Unix epoch (for example, changing the timezone from > /// empty to "Europe/Paris" would require converting the timestamp values > /// from "Europe/Paris" to "UTC", which seems counter-intuitive but is > /// nevertheless correct). This means the following parsing rules | Example | Output Arrow Type | Value | ------------------------------------------ | ------------------------------------ | ------------------------------------------------------ | to_timestamp('1970-01-01') | `Timestamp(Nanoseconds, None)` | 0 | to_timestamp('1970-01-01+00:00') | `Timestamp(Nanoseconds, None)` | 0 | to_timestamp('1970-01-01+01:00') | `Timestamp(Nanoseconds, None)` | -3600000000000 (1 hour of nanoseconds) | to_timestamp('1970-01-01', 'UTC') | `Timestamp(Nanoseconds, Utc)` | 0 | to_timestamp('1970-01-01+00:00', 'UTC') | `Timestamp(Nanoseconds, Utc)` | 0 | to_timestamp('1970-01-01+01:00', 'UTC') | `Timestamp(Nanoseconds, Utc)` | -3600000000000 (1 hour of nanoseconds) | to_timestamp('1970-01-01', '+01:00') | `Timestamp(Nanoseconds, +01:00)` | -3600000000000 (1 hour of nanoseconds) | to_timestamp('1970-01-01+00:00', '+01:00') | `Timestamp(Nanoseconds, +01:00)` | 0 | to_timestamp('1970-01-01+01:00', '+01:00') | `Timestamp(Nanoseconds, +01:00)` | -3600000000000 (1 hour of nanoseconds) Crucially this means "when parsing a timestamp without a timezone into a timestamp with a timezone, it assumes the given timestamp is already in the destination timezone". Or ``` to_timestamp('${timestamp}', '${tz}') == to_timestamp('${timestamp}${tz}', '${tz}') == to_timestamp('${timestamp}${tz}') ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
