john-bodley commented on a change in pull request #19056:
URL: https://github.com/apache/superset/pull/19056#discussion_r821277333



##########
File path: docs/docs/miscellaneous/timezones.mdx
##########
@@ -0,0 +1,50 @@
+---
+title: Timezones
+hide_title: true
+sidebar_position: 1
+version: 1
+---
+
+## Timezones
+
+There are four distinct timezone components which relate to Apache Superset, 
+
+1. The timezone that the underlying data is encoded in.
+2. The timezone of the database engine.
+3. The timezone of the Apache Superset backend.
+4. The timezone of the Apache Superset client.
+
+where if a temporal field (`DATETIME`, `TIME`, `TIMESTAMP`, etc.) does not 
explicitly define a timezone it defaults to the underlying timezone of the 
component.
+
+To help make the problem somewhat tractable—given that Apache Superset has no 
control on either how the data is ingested (1) or the timezone of the client 
(4)—from a consistency standpoint it is highly recommended that both (2) and 
(3) are configured to use the same timezone with a strong preference given to 
[UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time) to ensure 
temporal fields without an explicit timestamp are not incorrectly coerced into 
the wrong timezone. Actually Apache Superset currently has implicit assumptions 
that timestamps are in UTC and thus configuring (3) to a non-UTC timezone could 
be problematic.
+
+To strive for data consistency (regardless of the timezone of the client) the 
Apache Superset backend tries to ensure that any timestamp sent to the client 
has an explicit (or semi-explicit as in the case with [Epoch 
time](https://en.wikipedia.org/wiki/Unix_time) which is always in reference to 
UTC) timezone encoded within. 
+
+The challenge however lies with the slew of [database 
engines](/docs/databases/installing-database-drivers#install-database-drivers) 
which Apache Superset supports and various inconsistencies between their 
[Python Database API (DB 
API)](databases/installing-database-drivers#install-database-drivers) 
implementations combined with the fact that we use 
[Pandas](https://pandas.pydata.org/) to read SQL into a DataFrame prior to 
serializing to JSON. Regrettably Pandas ignores the DB API 
[type_code](https://www.python.org/dev/peps/pep-0249/#type-objects) relying by 
default on the underlying Python type returned by the DB API. Currently only a 
subset of the supported database engines work correctly with Pandas, i.e., 
ensuring timestamps without an explicit timestamp are serializd to JSON with 
the server timezone, thus guaranteeing the client will display timestamps in a 
consistent manner irrespective of the client's timezone.  

Review comment:
       This paragraph and following example likely doesn't live here, i.e., it 
might best be suited for `CONTRIBUTING.md` or similar. Additionally I'm really 
not sure what the best fix is. I'm concerned about adding yet more logic to 
Superset to special handle these nuances especially when the underlying issue 
resides with Pandas and/or the underlying DB APIs. Per 
https://github.com/trinodb/trino-python-client/pull/160 it seems like Trino is 
working on this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to