john-bodley commented on a change in pull request #19056: URL: https://github.com/apache/superset/pull/19056#discussion_r821277333
########## File path: docs/docs/miscellaneous/timezones.mdx ########## @@ -0,0 +1,50 @@ +--- +title: Timezones +hide_title: true +sidebar_position: 1 +version: 1 +--- + +## Timezones + +There are four distinct timezone components which relate to Apache Superset, + +1. The timezone that the underlying data is encoded in. +2. The timezone of the database engine. +3. The timezone of the Apache Superset backend. +4. The timezone of the Apache Superset client. + +where if a temporal field (`DATETIME`, `TIME`, `TIMESTAMP`, etc.) does not explicitly define a timezone it defaults to the underlying timezone of the component. + +To help make the problem somewhat tractable—given that Apache Superset has no control on either how the data is ingested (1) or the timezone of the client (4)—from a consistency standpoint it is highly recommended that both (2) and (3) are configured to use the same timezone with a strong preference given to [UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) to ensure temporal fields without an explicit timestamp are not incorrectly coerced into the wrong timezone. Actually Apache Superset currently has implicit assumptions that timestamps are in UTC and thus configuring (3) to a non-UTC timezone could be problematic. + +To strive for data consistency (regardless of the timezone of the client) the Apache Superset backend tries to ensure that any timestamp sent to the client has an explicit (or semi-explicit as in the case with [Epoch time](https://en.wikipedia.org/wiki/Unix_time) which is always in reference to UTC) timezone encoded within. + +The challenge however lies with the slew of [database engines](/docs/databases/installing-database-drivers#install-database-drivers) which Apache Superset supports and various inconsistencies between their [Python Database API (DB API)](databases/installing-database-drivers#install-database-drivers) implementations combined with the fact that we use [Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to serializing to JSON. Regrettably Pandas ignores the DB API [type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by default on the underlying Python type returned by the DB API. Currently only a subset of the supported database engines work correctly with Pandas, i.e., ensuring timestamps without an explicit timestamp are serializd to JSON with the server timezone, thus guaranteeing the client will display timestamps in a consistent manner irrespective of the client's timezone. Review comment: This paragraph and following example likely doesn't live here, i.e., it might best be suited for `CONTRIBUTING.md` or similar. Additionally I'm really not sure what the best fix is. I'm concerned about adding yet more logic to Superset to special handle these nuances especially when the underlying issue resides with Pandas and/or the underlying DB APIs. Per https://github.com/trinodb/trino-python-client/pull/160 it seems like Trino is working on this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
