lidavidm commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2508392727


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -544,6 +544,39 @@ Primitive Type Mappings
 | UUID extension type  | UUID                   |
 +----------------------+------------------------+
 
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=============
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+  * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+  Extension metadata is an empty string.
+
+.. note::
+
+   It is also *permissible* for the ``offset_minutes`` field to be 
dictionary-encoded with a preferred (*but not required*) index type of 
``int8``, or run-end-encoded with a preferred (*but not required*) runs type of 
``int8``.
+
+.. note::
+
+  Although not required, it is *recommended* that implementations represent 
this type as an RFC3339 string when de/serializing to/from JSON, respecting the 
``TimeUnit`` precision and time zone offset without loss of information. For 
example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second 
precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one 
nanosecond after January 1st 2025 in UTC-07.
+
+  The rationale behind this recommendation is that many programming languages 
provide support for parsing RFC3339 out of the box, facilitating consumption of 
timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for 
integrating with Arrow.

Review Comment:
   ```suggestion
   .. note::
   
      Although not required, it is *recommended* that implementations represent 
this type as an RFC3339 string when de/serializing to/from JSON, respecting the 
``TimeUnit`` precision and time zone offset without loss of information. For 
example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second 
precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one 
nanosecond after January 1st 2025 in UTC-07.
   
      The rationale behind this recommendation is that many programming 
languages provide support for parsing RFC3339 out of the box, facilitating 
consumption of timezone-aware JSON-encoded Arrow arrays without extra 
boilerplate just for integrating with Arrow.
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to