mikebridge opened a new pull request, #40918:
URL: https://github.com/apache/superset/pull/40918

   > **⚠️ Proof of concept — not for merge.** Opening as a **draft** to gather 
early design feedback on the approach. Scope is intentionally narrow 
(PostgreSQL + Apache Impala, behind an off-by-default flag) and several 
follow-ups are listed under *Known limitations*.
   
   ### SUMMARY
   
   A proof of concept for **dataset-level time zone support** (sc-108351), 
driven by a customer that stores all event data as **UTC epoch** and needs it 
displayed/analysed in a fixed local zone (e.g. on-premise deployments in New 
Zealand).
   
   Behind a new off-by-default feature flag `DATASET_PRESENTATION_TIMEZONE`, a 
dataset can be configured with a single **presentation time zone** (an IANA 
name). When set, the dataset's temporal columns are:
   
   - **bucketed** by time grain in that zone (DST-correct), so a "by day" 
chart's days are *local* days, not UTC days;
   - **filtered** in that zone — and crucially, the time-range filter **shifts 
the boundary rather than wrapping the column**, so index/partition pruning is 
preserved (important at the driving customer's scale of billions of rows/hour 
on hourly-UTC-partitioned tables);
   - handled for **epoch-stored** timestamps (`epoch_s`/`epoch_ms`) as well as 
zone-aware and naive timestamp columns.
   
   Key design decisions:
   
   - **Per-dataset presentation zone** + a **dataset-level `source_timezone`** 
(the zone naive/zone-less stored data is in) with an optional **per-column 
override**. All viewers of a chart see identical data — there is no per-user 
zone (out of scope, by design).
   - **Engine capability** (`BaseEngineSpec.supports_presentation_timezone`) 
with per-engine SQL strategies (`presentation_timezone_column` / 
`presentation_timezone_bound`). PostgreSQL (`AT TIME ZONE`) and Apache Impala 
(`FROM_UTC_TIMESTAMP`/`TO_UTC_TIMESTAMP`) are implemented; all other engines 
are gated off.
   - The zone name is **string-templated into SQL**, so an IANA allowlist 
(`superset/utils/timezones.py`) validated at the SQL boundary is the 
load-bearing injection control.
   - **Inert when the flag is off** — `_presentation_timezone()` short-circuits 
on the flag before any relationship/column access, and tests assert 
byte-identical SQL for the off path.
   - Additive/nullable/reversible migrations (up/down dry-run verified on 
SQLite **and** PostgreSQL).
   
   > Generated with the assistance of Claude Code (Anthropic) as an exploratory 
PoC; co-authorship is noted in the commit trailer. Please review accordingly.
   
   ### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
   
   Demo (three "count by day" charts over the *same* 24 UTC-epoch events, one 
per hour across a single UTC day):
   
   | UTC (baseline) | America/New_York (−4) | Pacific/Auckland (+12) |
   |---|---|---|
   | Jun 15 = 24 | Jun 14 = 4, Jun 15 = 20 | Jun 15 = 12, Jun 16 = 12 |
   
   Same data, one dataset setting → three different local calendars. 
(Screenshots to be attached.)
   
   ### TESTING INSTRUCTIONS
   
   1. Enable the flag: `FEATURE_FLAGS = {"DATASET_PRESENTATION_TIMEZONE": 
True}` and run `superset db upgrade`.
   2. On a **PostgreSQL** dataset with a temporal column, open **Edit → 
Settings → Advanced** and set a **Presentation time zone** (e.g. 
`America/New_York`).
   3. Build a time-series "Count by day" chart; confirm the buckets shift to 
the local day, and that clearing the zone reverts to UTC.
   4. Unit tests: `pytest tests/unit_tests/db_engine_specs/test_postgres.py 
tests/unit_tests/db_engine_specs/test_impala.py 
tests/unit_tests/models/helpers_test.py tests/unit_tests/datasets/ -q` and the 
FE Jest test `DatasourceEditorPresentationTimezone.test.tsx`.
   
   ### Known limitations (intentional for this PoC)
   
   - Raw, ungrouped timestamp columns are still displayed as stored (only grain 
bucketing + the time-range control are zoned).
   - Explicit comparison filters (`=`, `<`, `>`, …) on a temporal column are 
not yet zone-shifted (only the time-range control is).
   - Relative-time functions ("last hour", "today") are not yet anchored to 
*now* in the configured zone.
   - Sub-day grains across a DST transition inherit the database's resolution 
of the missing/duplicated hour.
   - Virtual/calculated (expression) columns and SQL Lab are out of scope.
   - A SIP is required before the flag can default on (this change does not 
flip it).
   
   ### ADDITIONAL INFORMATION
   
   - [ ] Has associated issue:
   - [x] Required feature flags: `DATASET_PRESENTATION_TIMEZONE`
   - [x] Changes UI
   - [x] Includes DB Migration (follow approval process in 
[SIP-59](https://github.com/apache/superset/issues/13351))
     - [x] Migration is atomic, supports rollback & is backwards-compatible
     - [x] Confirm DB migration upgrade and downgrade tested
     - [x] Runtime estimates and downtime expectations provided (three nullable 
column adds, metadata-only, no backfill — effectively instant)
   - [x] Introduces new feature or API
   - [ ] Removes existing feature or API
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to