felipecrv commented on code in PR #8743: URL: https://github.com/apache/arrow-rs/pull/8743#discussion_r2631967392
########## arrow-schema/src/extension/canonical/timestamp_with_offset.rs: ########## @@ -0,0 +1,530 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +//! Timestamp with an offset in minutes +//! +//! <https://arrow.apache.org/docs/format/CanonicalExtensions.html#timestamp-with-offset> + +use crate::{ArrowError, DataType, extension::ExtensionType}; + +/// The extension type for `TimestampWithOffset`. +/// +/// Extension name: `arrow.timestamp_with_offset`. +/// +/// This type represents a timestamp column that stores potentially different timezone offsets per value. +/// The timestamp is stored in UTC alongside the original timezone offset in minutes. This extension type +/// is intended to be compatible with ANSI SQL's `TIMESTAMP WITH TIME ZONE`, which is supported by multiple +/// database engines. +/// +/// The storage type of the extension is a `Struct` with 2 fields, in order: +/// - `timestamp`: a non-nullable `Timestamp(time_unit, "UTC")`, where `time_unit` is any Arrow `TimeUnit` (s, ms, us or ns). +/// - `offset_minutes`: a non-nullable signed 16-bit integer (`Int16`) representing the offset in minutes +/// from the UTC timezone. Negative offsets represent time zones west of UTC, while positive offsets represent +/// east. Offsets normally range from -779 (-12:59) to +780 (+13:00). +/// +/// This type has no type parameters. +/// +/// Metadata is either empty or an empty string. +/// +/// It is also *permissible* for the `offset_minutes` field to be dictionary-encoded with a preferred (*but not required*) +/// index type of `int8`, or run-end-encoded with a preferred (*but not required*) runs type of `int8`. +/// +/// It's worth noting that the data source needs to resolve timezone strings such as `UTC` or +/// `Americas/Los_Angeles` into an offset in minutes in order to construct a `TimestampWithOffset`. +/// This makes `TimestampWithOffset` type "lossy" in the sense that any original "unresolved" +/// timezone string gets lost in this conversion. It's a tradeoff for optimizing the row +/// representation and simplifying the client code, which does not need to know how to convert +/// from timezone string to its corresponding offset in minutes. Review Comment: Can you `gq` the paragraphs on vim? Or the equivalent on your editor? Keeping them around 70 characters with uniform length. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
