On Wed, 2 Jun 2021 at 13:56, Antoine Pitrou <anto...@python.org> wrote:
> > Hello, > > For the first time I notice this piece of information about the > timestamp type: > > > /// * If the time zone is set to a valid value, values can be > displayed as > > /// "localized" to that time zone, even though the underlying 64-bit > > /// integers are identical to the same data stored in UTC. Converting > > /// between time zones is a metadata-only operation and does not > change the > > /// underlying values > > (from https://github.com/apache/arrow/blob/master/format/Schema.fbs#L223 ) > > This seems rather weird to me: timestamps always convey a UTC timestamp > value, optionally decorated with a local timezone? What is the > motivation for such a representation? It is unlike other systems such > as Python, where a timezone-aware timestamp really expresses a local > time value, not a UTC time value. > Just as reference: pandas uses the same model of storing UTC timestamps for timezone-aware data (I think numpy also stored it as UTC, before they removed support for it). And for example, I think also databases like Postgresql store it as UTC internally, AFAIK. The Python standard library datetime.datetime indeed stores localized timestamps. But important difference is that Python actually stores the year/month/day/hour/etc as separate values, so directly representing an actual moment in time in a certain timezone. While I think what we store is considered as "unix time"? (epoch since January 1st, 1970 at UTC) I am not sure how you would store a timestamp in a certain timezone in this model. Some advantages of storing UTC that come to mind: it makes converting from one timezone to another a trivial (metadata-only) operation, makes easier to do timestamp comparisons across timezones, and it makes timedelta-arithmetic easier. Joris > Thank you, > > Antoine. >