serramatutu commented on code in PR #48002: URL: https://github.com/apache/arrow/pull/48002#discussion_r2481948785
########## docs/source/format/CanonicalExtensions.rst: ########## @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). Review Comment: > Alternatively we could also specify that if one is null, the other should be null as well? Yea, that's more or less what I was thinking about. In principle this type only has meaning if both fields are set. To relax these constraints we'd need to come up with a meaning for what a null timestamp and non-null offset would mean and vice versa. Could be: - If timestamp is set and offset is null, assume `offset=0`, i.e timestamp is UTC - If timestamp is null and offset is set, assume the whole value is null (a standalone offset floating around has no meaning) Or, alternatively: - If any of the fields is null, assume the whole value is null as well The second option has the advantage of being able to and the two bitmasks together to figure out the global nullable buffer, and it reduces branching. ########## docs/source/format/CanonicalExtensions.rst: ########## @@ -483,6 +483,28 @@ binary values look like. .. _variant_primitive_type_mapping: +Timestamp With Offset +============= +This type represents a timestamp column that stores potentially different timezone offsets per value. The timestamp is stored in UTC alongside the original timezone offset in minutes. + +* Extension name: ``arrow.timestamp_with_offset``. + +* The storage type of the extension is a ``Struct`` with 2 fields, in order: + + * ``timestamp``: a non-nullable ``Timestamp(time_unit, "UTC")``, where ``time_unit`` is any Arrow ``TimeUnit`` (s, ms, us or ns). Review Comment: > Alternatively we could also specify that if one is null, the other should be null as well? Yea, that's more or less what I was thinking about. In principle this type only has meaning if both fields are set. To relax these constraints we'd need to come up with a meaning for what a null timestamp and non-null offset would mean and vice versa. Could be: - If timestamp is set and offset is null, assume `offset=0`, i.e timestamp is UTC - If timestamp is null and offset is set, assume the whole value is null (a standalone offset floating around has no meaning) Or, alternatively: - If any of the fields is null, assume the whole value is null as well -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
