On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou <anto...@python.org> wrote:

>
> Hello,
>
> For the first time I notice this piece of information about the
> timestamp type:
>
> >   /// * If the time zone is set to a valid value, values can be
> displayed as
> >   ///   "localized" to that time zone, even though the underlying 64-bit
> >   ///   integers are identical to the same data stored in UTC. Converting
> >   ///   between time zones is a metadata-only operation and does not
> change the
> >   ///   underlying values
>
> (from https://github.com/apache/arrow/blob/master/format/Schema.fbs#L223 )
>
> This seems rather weird to me: timestamps always convey a UTC timestamp
> value, optionally decorated with a local timezone?  What is the
> motivation for such a representation?  It is unlike other systems such
> as Python, where a timezone-aware timestamp really expresses a local
> time value, not a UTC time value.
>

Just as reference: pandas uses the same model of storing UTC timestamps for
timezone-aware data (I think numpy also stored it as UTC, before they
removed support for it). And for example, I think also databases like
Postgresql store it as UTC internally, AFAIK.
The Python standard library datetime.datetime indeed stores localized
timestamps. But important difference is that Python actually stores the
year/month/day/hour/etc as separate values, so directly representing an
actual moment in time in a certain timezone. While I think what we store is
considered as "unix time"? (epoch since January 1st, 1970 at UTC) I am not
sure how you would store a timestamp in a certain timezone in this model.

Some advantages of storing UTC that come to mind: it makes converting from
one timezone to another a trivial (metadata-only) operation, makes easier
to do timestamp comparisons across timezones, and it makes
timedelta-arithmetic easier.

Joris


> Thank you,
>
> Antoine.
>

Reply via email to