Purushottam Sinha created FLINK-39745:
-----------------------------------------
Summary: Web dashboard: surface watermark lag as a per-job timeline
Key: FLINK-39745
URL: https://issues.apache.org/jira/browse/FLINK-39745
Project: Flink
Issue Type: New Feature
Components: Runtime / Web Frontend
Reporter: Purushottam Sinha
Problem
The current per-operator Watermarks drawer
(flink-runtime-web/web-dashboard/src/app/pages/job/overview/watermarks/job-overview-drawer-watermarks.component.ts)
shows only the latest raw epoch-millis watermark per subtask. Operators must
mentally subtract from wall clock and re-poll to see whether event-time is
keeping up — the trend, which is the actual diagnostic signal, is invisible.
Evidence
- Drawer table renders one row per subtask with the current
currentInputWatermark only; previous values are discarded on each refresh.
- Peer systems (Dataflow "system lag", Materialize lag metric, Confluent
Cloud Flink) ship a per-stage lag-over-time chart for this data.
- All required data is already exposed via GET
/jobs/:id/vertices/:vid/watermarks — no backend change needed.
Proposed fix
- New job-level "Watermarks" tab: multi-operator lag-over-time chart with a
configurable alert band, plus per-vertex status cards (current lag, trend,
healthy/tracked/elevated/idle) sorted worst-first.
- Inline lag sparkline next to each operator name on the overview operator
list.
- Opt-in localStorage persistence of the rolling 30-min buffer so the chart
survives a page refresh for the same job ID.
Acceptance
- New tab plots one line per vertex with lag in seconds; idle vertices render
as "Idle" rather than infinite lag.
- Toggle "Remember across refresh" survives Cmd+R; clears on job-ID change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)