serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2479249353


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -483,6 +483,28 @@ binary values look like.
 
 .. _variant_primitive_type_mapping:
 
+Timestamp With Offset
+=============
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   But...
   
   the main reason behind this proposal is compatibility with ANSI SQL 
`TIMESTAMP WITH TIME ZONE`, which is supported by multiple database systems 
(Snowflake, MS SQL Server, Oracle, Trino).
   
   This is the reasoning behind why we're proposing an offset in minutes as 
signed 16-bit int:
   
   > In ANSI SQL, the time zone information is defined in terms of an 
"INTERVAL" offset ranging from "INTERVAL - '12:59' HOUR TO MINUTE" to "INTERVAL 
+ '13:00' HOUR TO MINUTE". Since "MINUTE" is the smallest granularity with 
which you can represent a time zone offset, and the maximum minutes in the 
offset is 13*60=780, we believe it makes sense for the offset to be stored as a 
16-bit integer in minutes.
   
   > It is important to point out that some systems such as MS SQL Server do 
implement data types that can represent offsets with sub-minute granularity. We 
believe representing sub-minute granularity is out of scope for this proposal 
given that no current or past time zone standards have ever specified 
sub-minute offsets [9], and that is what we're trying to solve for. 
Furthermore, representing the offset in seconds rather than minutes would mean 
the maximum offset is 13*60*60=46800, which is greater than the maximum 
positive integer an int16 can represent (32768), and thus the offset type would 
need to be wider (int32).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to