serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2546657710


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -544,6 +544,49 @@ Primitive Type Mappings
 | UUID extension type  | UUID                   |
 +----------------------+------------------------+
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=============
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a preferably non-nullable ``Timestamp(time_unit, "UTC")``, 
where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a preferably non-nullable signed 16-bit integer 
(``Int16``) representing the offset in minutes from the UTC timezone. Negative 
offsets represent time zones west of UTC, while positive offsets represent 
east. Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded or run-end-encoded.
+
+.. note::
+
+   It is also *permissible* for ``timestamp`` and ``offset_minutes`` to be 
nullable, but not recommended.
+
+   If ``timestamp`` is nullable and a value is found to be null, then the 
whole ``TimestampWithOffset`` value should be interpreted as null. One way of 
achieving this is to drop ``timestamp``'s validity buffer (V1) and replace the 
top-level struct's validity buffer (V2) with the result of ``V1 AND V2``.
+
+   If ``offset_minutes`` is nullable and a value is found to be null, then 
this value should be interpreted as if the offset value were were zero.
+
+   It is *recommended* that implementations normalize this type's 
representation by dropping the inner validity buffers and applying the 
aforementioned transformations, only keeping the top-level struct's validity 
buffer.
+
+.. note::

Review Comment:
   I added this mostly to make integrating to Arrow via JSON more ergonomic so 
people can do stuff like `time.Parse()` in Go or `datetime.fromisoformat()` in 
Python, instead of having to manually parse something like `{"timestamp": 
"...", "offset": 123}` to get timezone-aware objects in their language.
   
   I can take this off of the spec if you think it doesn't belong here though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to