serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2481948785
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). Review Comment: > Alternatively we could also specify that if one is null, the other should be null as well? Yea, that's more or less what I was thinking about. In principle this type only has meaning if both fields are set. To relax these constraints we'd need to come up with a meaning for what a null timestamp and non-null offset would mean and vice versa. Could be: - If timestamp is set and offset is null, assume `offset=0`, i.e timestamp is UTC - If timestamp is null and offset is set, assume the whole value is null (a standalone offset floating around has no meaning) Or, alternatively: - If any of the fields is null, assume the whole value is null as well None of these is inherently better or worse IMHO, it's just a matter of standardizing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
