Purushottam Sinha created FLINK-39745:
-----------------------------------------

             Summary: Web dashboard: surface watermark lag as a per-job timeline
                 Key: FLINK-39745
                 URL: https://issues.apache.org/jira/browse/FLINK-39745
             Project: Flink
          Issue Type: New Feature
          Components: Runtime / Web Frontend
            Reporter: Purushottam Sinha


Problem
The current per-operator Watermarks drawer 
(flink-runtime-web/web-dashboard/src/app/pages/job/overview/watermarks/job-overview-drawer-watermarks.component.ts)
 shows only the latest raw epoch-millis watermark per subtask. Operators must 
mentally subtract from wall clock and re-poll to see whether event-time is 
keeping up — the trend, which is the actual diagnostic signal, is invisible.

Evidence
  - Drawer table renders one row per subtask with the current 
currentInputWatermark only; previous values are discarded on each refresh.
  - Peer systems (Dataflow "system lag", Materialize lag metric, Confluent 
Cloud Flink) ship a per-stage lag-over-time chart for this data.
  - All required data is already exposed via GET 
/jobs/:id/vertices/:vid/watermarks — no backend change needed.

Proposed fix
  - New job-level "Watermarks" tab: multi-operator lag-over-time chart with a 
configurable alert band, plus per-vertex status cards (current lag, trend, 
healthy/tracked/elevated/idle) sorted worst-first.
  - Inline lag sparkline next to each operator name on the overview operator 
list.
  - Opt-in localStorage persistence of the rolling 30-min buffer so the chart 
survives a page refresh for the same job ID.

Acceptance
  - New tab plots one line per vertex with lag in seconds; idle vertices render 
as "Idle" rather than infinite lag.
  - Toggle "Remember across refresh" survives Cmd+R; clears on job-ID change.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to