tustvold commented on issue #686:
URL: 
https://github.com/apache/arrow-datafusion/issues/686#issuecomment-1465179078

   Timezone aware timestamp parsing was recently added to arrow-rs - 
https://github.com/apache/arrow-rs/pull/3795
   
   In particular it follow the approach laid out in the [arrow 
standard](https://github.com/apache/arrow/blob/main/format/Schema.fbs#L289)
   
   In particular
   
   > /// If a Timestamp column has a non-empty timezone, changing the timezone
   > /// to a different non-empty value is a metadata-only operation:
   > /// the timestamp values need not change as their point of reference 
remains
   > /// the same (the Unix epoch).
   > ///
   > /// However, if a Timestamp column has no timezone value, changing it to a
   > /// non-empty value requires to think about the desired semantics.
   > /// One possibility is to assume that the original timestamp values are
   > /// relative to the epoch of the timezone being set; timestamp values 
should
   > /// then adjusted to the Unix epoch (for example, changing the timezone 
from
   > /// empty to "Europe/Paris" would require converting the timestamp values
   > /// from "Europe/Paris" to "UTC", which seems counter-intuitive but is
   > /// nevertheless correct).
   
   This means the following parsing rules
   
   | Example                                           | Output Arrow Type      
              |     Value
   | ------------------------------------------ | 
------------------------------------ | 
------------------------------------------------------
   | to_timestamp('1970-01-01')                 | `Timestamp(Nanoseconds, 
None)`       | 0
   | to_timestamp('1970-01-01+00:00')           | `Timestamp(Nanoseconds, 
None)`       | 0
   | to_timestamp('1970-01-01+01:00')           | `Timestamp(Nanoseconds, 
None)`       | -3600000000000 (1 hour of nanoseconds)
   | to_timestamp('1970-01-01', 'UTC')          | `Timestamp(Nanoseconds, Utc)` 
       | 0
   | to_timestamp('1970-01-01+00:00', 'UTC')    | `Timestamp(Nanoseconds, Utc)` 
       | 0
   | to_timestamp('1970-01-01+01:00', 'UTC')    | `Timestamp(Nanoseconds, Utc)` 
       | -3600000000000 (1 hour of nanoseconds) 
   | to_timestamp('1970-01-01', '+01:00')          | `Timestamp(Nanoseconds, 
+01:00)`        | -3600000000000 (1 hour of nanoseconds)
   | to_timestamp('1970-01-01+00:00', '+01:00')    | `Timestamp(Nanoseconds, 
+01:00)`        | 0
   | to_timestamp('1970-01-01+01:00', '+01:00')    | `Timestamp(Nanoseconds, 
+01:00)`        | -3600000000000 (1 hour of nanoseconds)
   
   Crucially this means "when parsing a timestamp without a timezone into a 
timestamp with a timezone, it assumes the given timestamp is already in the 
destination timezone". Or 
   
   ```
   to_timestamp('${timestamp}', '${tz}') == to_timestamp('${timestamp}${tz}', 
'${tz}') == to_timestamp('${timestamp}${tz}')
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to