[
https://issues.apache.org/jira/browse/FLINK-39745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18084284#comment-18084284
]
Purushottam Sinha commented on FLINK-39745:
-------------------------------------------
I will be picking it up when time allow. Dumping the idea for now.
> Web dashboard: surface watermark lag as a per-job timeline
> ----------------------------------------------------------
>
> Key: FLINK-39745
> URL: https://issues.apache.org/jira/browse/FLINK-39745
> Project: Flink
> Issue Type: New Feature
> Components: Runtime / Web Frontend
> Reporter: Purushottam Sinha
> Priority: Major
>
> Problem
> The current per-operator Watermarks drawer
> (flink-runtime-web/web-dashboard/src/app/pages/job/overview/watermarks/job-overview-drawer-watermarks.component.ts)
> shows only the latest raw epoch-millis watermark per subtask. Operators must
> mentally subtract from wall clock and re-poll to see whether event-time is
> keeping up — the trend, which is the actual diagnostic signal, is invisible.
> Evidence
> - Drawer table renders one row per subtask with the current
> currentInputWatermark only; previous values are discarded on each refresh.
> - Peer systems (Dataflow "system lag", Materialize lag metric, Confluent
> Cloud Flink) ship a per-stage lag-over-time chart for this data.
> - All required data is already exposed via GET
> /jobs/:id/vertices/:vid/watermarks — no backend change needed.
> Proposed fix
> - New job-level "Watermarks" tab: multi-operator lag-over-time chart with a
> configurable alert band, plus per-vertex status cards (current lag, trend,
> healthy/tracked/elevated/idle) sorted worst-first.
> - Inline lag sparkline next to each operator name on the overview operator
> list.
> - Opt-in localStorage persistence of the rolling 30-min buffer so the chart
> survives a page refresh for the same job ID.
> Acceptance
> - New tab plots one line per vertex with lag in seconds; idle vertices
> render as "Idle" rather than infinite lag.
> - Toggle "Remember across refresh" survives Cmd+R; clears on job-ID change.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)