mikebridge opened a new pull request, #40918:
URL: https://github.com/apache/superset/pull/40918
> **⚠️ Proof of concept — not for merge.** Opening as a **draft** to gather
early design feedback on the approach. Scope is intentionally narrow
(PostgreSQL + Apache Impala, behind an off-by-default flag) and several
follow-ups are listed under *Known limitations*.
### SUMMARY
A proof of concept for **dataset-level time zone support** (sc-108351),
driven by a customer that stores all event data as **UTC epoch** and needs it
displayed/analysed in a fixed local zone (e.g. on-premise deployments in New
Zealand).
Behind a new off-by-default feature flag `DATASET_PRESENTATION_TIMEZONE`, a
dataset can be configured with a single **presentation time zone** (an IANA
name). When set, the dataset's temporal columns are:
- **bucketed** by time grain in that zone (DST-correct), so a "by day"
chart's days are *local* days, not UTC days;
- **filtered** in that zone — and crucially, the time-range filter **shifts
the boundary rather than wrapping the column**, so index/partition pruning is
preserved (important at the driving customer's scale of billions of rows/hour
on hourly-UTC-partitioned tables);
- handled for **epoch-stored** timestamps (`epoch_s`/`epoch_ms`) as well as
zone-aware and naive timestamp columns.
Key design decisions:
- **Per-dataset presentation zone** + a **dataset-level `source_timezone`**
(the zone naive/zone-less stored data is in) with an optional **per-column
override**. All viewers of a chart see identical data — there is no per-user
zone (out of scope, by design).
- **Engine capability** (`BaseEngineSpec.supports_presentation_timezone`)
with per-engine SQL strategies (`presentation_timezone_column` /
`presentation_timezone_bound`). PostgreSQL (`AT TIME ZONE`) and Apache Impala
(`FROM_UTC_TIMESTAMP`/`TO_UTC_TIMESTAMP`) are implemented; all other engines
are gated off.
- The zone name is **string-templated into SQL**, so an IANA allowlist
(`superset/utils/timezones.py`) validated at the SQL boundary is the
load-bearing injection control.
- **Inert when the flag is off** — `_presentation_timezone()` short-circuits
on the flag before any relationship/column access, and tests assert
byte-identical SQL for the off path.
- Additive/nullable/reversible migrations (up/down dry-run verified on
SQLite **and** PostgreSQL).
> Generated with the assistance of Claude Code (Anthropic) as an exploratory
PoC; co-authorship is noted in the commit trailer. Please review accordingly.
### BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
Demo (three "count by day" charts over the *same* 24 UTC-epoch events, one
per hour across a single UTC day):
| UTC (baseline) | America/New_York (−4) | Pacific/Auckland (+12) |
|---|---|---|
| Jun 15 = 24 | Jun 14 = 4, Jun 15 = 20 | Jun 15 = 12, Jun 16 = 12 |
Same data, one dataset setting → three different local calendars.
(Screenshots to be attached.)
### TESTING INSTRUCTIONS
1. Enable the flag: `FEATURE_FLAGS = {"DATASET_PRESENTATION_TIMEZONE":
True}` and run `superset db upgrade`.
2. On a **PostgreSQL** dataset with a temporal column, open **Edit →
Settings → Advanced** and set a **Presentation time zone** (e.g.
`America/New_York`).
3. Build a time-series "Count by day" chart; confirm the buckets shift to
the local day, and that clearing the zone reverts to UTC.
4. Unit tests: `pytest tests/unit_tests/db_engine_specs/test_postgres.py
tests/unit_tests/db_engine_specs/test_impala.py
tests/unit_tests/models/helpers_test.py tests/unit_tests/datasets/ -q` and the
FE Jest test `DatasourceEditorPresentationTimezone.test.tsx`.
### Known limitations (intentional for this PoC)
- Raw, ungrouped timestamp columns are still displayed as stored (only grain
bucketing + the time-range control are zoned).
- Explicit comparison filters (`=`, `<`, `>`, …) on a temporal column are
not yet zone-shifted (only the time-range control is).
- Relative-time functions ("last hour", "today") are not yet anchored to
*now* in the configured zone.
- Sub-day grains across a DST transition inherit the database's resolution
of the missing/duplicated hour.
- Virtual/calculated (expression) columns and SQL Lab are out of scope.
- A SIP is required before the flag can default on (this change does not
flip it).
### ADDITIONAL INFORMATION
- [ ] Has associated issue:
- [x] Required feature flags: `DATASET_PRESENTATION_TIMEZONE`
- [x] Changes UI
- [x] Includes DB Migration (follow approval process in
[SIP-59](https://github.com/apache/superset/issues/13351))
- [x] Migration is atomic, supports rollback & is backwards-compatible
- [x] Confirm DB migration upgrade and downgrade tested
- [x] Runtime estimates and downtime expectations provided (three nullable
column adds, metadata-only, no backfill — effectively instant)
- [x] Introduces new feature or API
- [ ] Removes existing feature or API
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]