xingbe created FLINK-30558: ------------------------------ Summary: The metric 'numRestarts' reported in SchedulerBase will be overridden by metric 'fullRestarts' Key: FLINK-30558 URL: https://issues.apache.org/jira/browse/FLINK-30558 Project: Flink Issue Type: Bug Components: Runtime / Metrics Affects Versions: 1.17.0 Reporter: xingbe Fix For: 1.17.0
The method SchedulerBase#registerJobMetrics register metrics 'numRestarts' and 'fullRestarts' with the same metric object, as discussed in FLINK-30246, that will result in the loss of the metric 'numRestarts'. {code:java} metrics.gauge(MetricNames.NUM_RESTARTS, numberOfRestarts); metrics.gauge(MetricNames.FULL_RESTARTS, numberOfRestarts);{code} I have verified this problem via rest api /jobs/:jobid/metrics, and the response shows below, we can find that the metric 'numRestarts' is missing. {noformat} [{"id":"numberOfFailedCheckpoints"},{"id":"cancellingTime"},{"id":"lastCheckpointSize"},{"id":"totalNumberOfCheckpoints"},{"id":"lastCheckpointExternalPath"},{"id":"lastCheckpointRestoreTimestamp"},{"id":"failingTime"},{"id":"runningTime"},{"id":"uptime"},{"id":"restartingTime"},{"id":"initializingTime"},{"id":"numberOfInProgressCheckpoints"},{"id":"downtime"},{"id":"lastCheckpointProcessedData"},{"id":"numberOfCompletedCheckpoints"},{"id":"deployingTime"},{"id":"lastCheckpointFullSize"},{"id":"fullRestarts"},{"id":"createdTime"},{"id":"lastCheckpointDuration"},{"id":"lastCheckpointPersistedData"}]{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010)