westonpace commented on a change in pull request #10997:
URL: https://github.com/apache/arrow/pull/10997#discussion_r696085932



##########
File path: format/Schema.fbs
##########
@@ -214,58 +214,123 @@ table Time {
   bitWidth: int = 32;
 }
 
-/// Time elapsed from the Unix epoch, 00:00:00.000 on 1 January 1970, excluding
-/// leap seconds, as a 64-bit integer. Note that UNIX time does not include
-/// leap seconds.
+/// Timestamp is a 64-bit signed integer representing an elapsed time since a
+/// fixed epoch, stored in either of four units: seconds, milliseconds,
+/// microseconds or nanoseconds, and is optionally annotated with a timezone.
+///
+/// Timestamp values do not include any leap seconds (in other words, all
+/// days are considered 86400 seconds long).
+///
+/// Timestamps with a non-empty timezone
+/// ------------------------------------
+///
+/// If a Timestamp column has a non-empty timezone value, its epoch is
+/// 1970-01-01 00:00:00 (January 1st 1970, midnight) in the *UTC* timezone
+/// (the Unix epoch), regardless of the Timestamp's own timezone.
+///
+/// Therefore, timestamp values with a non-empty timezone correspond to
+/// physical points in time together with some additional information about
+/// how the data was obtained and/or how to display it (the timezone).
+///
+///   For example, the timestamp value 0 with the timezone string 
"Europe/Paris"
+///   corresponds to "January 1st 1970, 00h00" in the UTC timezone, but could
+///   also be displayed as "January 1st 1970, 01h00" in the Europe/Paris 
timezone
+///   (which is the same physical point in time).

Review comment:
       +0 (on this section only, +1 elsewhere). I think this section is better 
but still not clear.  I don't know if it can be made clear because users are 
free to interpret the timezone string to whatever semantic meaning they want.  
The time zone string could be the data author's personal favorite time zone and 
it would still be valid Arrow data.
   
   > /// ... but could
   ///   also be displayed as "January 1st 1970, 01h00" in the Europe/Paris 
timezone
   ///   (which is the same physical point in time).
   
   Isn't this true even if the timezone is set to "UTC" or "America/New_York"?
   
   Maybe we should add something along the lines of...
   
   ```
   /// Therefore, timestamp values with a non-empty timezone correspond to
   /// physical points in time together with some additional information about
   /// how the data was obtained and/or how to display it (the timezone).  The
   /// producer of the data determines the semantic meaning of the timezone
   /// string.
   ///
   ///   For example, the timestamp value 0 with the timezone string 
"Europe/Paris"
   ///   corresponds to "January 1st 1970, 00h00" in the UTC timezone, but the
   ///   data producer may prefer it to be displayed as "January 1st 1970, 
01h00"
   ///   in the Europe/Paris timezone (which is the same physical point in 
time).
   ///   The data producer would need to be contacted or state elsewhere how
   ///   the timezone string should be used.
   ```
   
   ...but I'm not all that convinced it is better either.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to