felipecrv commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2480139536
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). Review Comment: Interesting points! We should expand the spec text here and clarify expectations. Since I can see many operations on this array not caring about the two fields, having a validity buffer on the timestamp field could be a simplification in these cases. It would reduce the risk of computation being performed on garbage values if the struct's validity bitmap is being ignored. But a top-level validity buffer is necessary to keep generic code going through columns processing nulls correctly. One way we can adapt to this reality is to make a recommendation against validity on the timestamp field and a warning that even when the offset field is not touched, the validity bitmap of the computation's result should come from the struct validity, or, if both have validity buffers, the & of the two bitmaps. For the offset column we can recommend the absence of validity bitmap as well (non-nullable) but if a value is null, process it as if it were zero. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
