serramatutu commented on code in PR #48002:
URL: https://github.com/apache/arrow/pull/48002#discussion_r2493412961


##########
docs/source/format/CanonicalExtensions.rst:
##########
@@ -544,6 +544,29 @@ Primitive Type Mappings
 | UUID extension type  | UUID                   |
 +----------------------+------------------------+
 
+Timestamp With Offset
+=============
+This type represents a timestamp column that stores potentially different 
timezone offsets per value. The timestamp is stored in UTC alongside the 
original timezone offset in minutes.
+This extension type is intended to be compatible with ANSI SQL's ``TIMESTAMP 
WITH TIME ZONE``, which is supported by multiple database engines.
+
+* Extension name: ``arrow.timestamp_with_offset``.
+
+* The storage type of the extension is a ``Struct`` with 2 fields, in order:
+
+  * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where 
``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns).
+
+  * ``offset_minutes``: a non-nullable signed 16-bit integer (``Int16``) 
representing the offset in minutes from the UTC timezone. Negative offsets 
represent time zones west of UTC, while positive offsets represent east. 
Offsets range from -779 (-12:59) to +780 (+13:00).

Review Comment:
   There was a request in the mailing list to add dictionary encoding and 
run-end encoding to the offset column. 
   
   I don't see why we wouldn't wanna do run-end encoding, especially for large 
columns with lots of repeated offsets it could save a lot of space.
   
   **Should we add it to the spec already to avoid breaking changes?**
   
   For dictionary encoding: is it possible to use `uint8` or possibly something 
even smaller to represent the dictionary indices? Otherwise it only adds extra 
abstraction without saving that much space... The docs suggest using `int32` 
for dictionary encoding which would actually be worse...
   
   We can keep the implementations simple (only primitive encoding for now), 
and then patch them later to support all the encodings we decide to add to the 
spec.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to