lidavidm commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2508392727
##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -544,6 +544,39 @@ Primitive Type Mappings
| UUID extension type | UUID |
+----------------------+------------------------+
+.. _timestamp_with_offset_extension:
+
+Timestamp With Offset
+=============
+This type represents a timestamp column that stores potentially different
timezone offsets per value. The timestamp is stored in UTC alongside the
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+ * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+ * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``)
representing the offset in minutes from the UTC timezone. Negative offsets
represent time zones west of UTC, while positive offsets represent east.
Offsets range from -779 (-12:59) to +780 (+13:00).
+
+* Extension type parameters:
+
+ * ``time_unit``: the time-unit of each of the stored UTC timestamps.
+
+* Description of the serialization:
+
+ Extension metadata is an empty string.
+
+.. note::
+
+ It is also *permissible* for the ``offset_minutes`` field to be
dictionary-encoded with a preferred (*but not required*) index type of
``int8``, or run-end-encoded with a preferred (*but not required*) runs type of
``int8``.
+
+.. note::
+
+ Although not required, it is *recommended* that implementations represent
this type as an RFC3339 string when de/serializing to/from JSON, respecting the
``TimeUnit`` precision and time zone offset without loss of information. For
example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second
precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one
nanosecond after January 1st 2025 in UTC-07.
+
+ The rationale behind this recommendation is that many programming languages
provide support for parsing RFC3339 out of the box, facilitating consumption of
timezone-aware JSON-encoded Arrow arrays without extra boilerplate just for
integrating with Arrow.
Review Comment:
```suggestion
.. note::
Although not required, it is *recommended* that implementations represent
this type as an RFC3339 string when de/serializing to/from JSON, respecting the
``TimeUnit`` precision and time zone offset without loss of information. For
example ``2025-01-01T00:00:00Z`` represents January 1st 2025 in UTC with second
precision, and ``2025-01-01T00:00:00.000000001-07:00`` represents one
nanosecond after January 1st 2025 in UTC-07.
The rationale behind this recommendation is that many programming
languages provide support for parsing RFC3339 out of the box, facilitating
consumption of timezone-aware JSON-encoded Arrow arrays without extra
boilerplate just for integrating with Arrow.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]