xingbe created FLINK-30558:
------------------------------
Summary: The metric 'numRestarts' reported in SchedulerBase will
be overridden by metric 'fullRestarts'
Key: FLINK-30558
URL: https://issues.apache.org/jira/browse/FLINK-30558
Project: Flink
Issue Type: Bug
Components: Runtime / Metrics
Affects Versions: 1.17.0
Reporter: xingbe
Fix For: 1.17.0
The method SchedulerBase#registerJobMetrics register metrics 'numRestarts' and
'fullRestarts' with the same metric object, as discussed in FLINK-30246, that
will result in the loss of the metric 'numRestarts'.
{code:java}
metrics.gauge(MetricNames.NUM_RESTARTS, numberOfRestarts);
metrics.gauge(MetricNames.FULL_RESTARTS, numberOfRestarts);{code}
I have verified this problem via rest api /jobs/:jobid/metrics, and the
response shows below, we can find that the metric 'numRestarts' is missing.
{noformat}
[{"id":"numberOfFailedCheckpoints"},{"id":"cancellingTime"},{"id":"lastCheckpointSize"},{"id":"totalNumberOfCheckpoints"},{"id":"lastCheckpointExternalPath"},{"id":"lastCheckpointRestoreTimestamp"},{"id":"failingTime"},{"id":"runningTime"},{"id":"uptime"},{"id":"restartingTime"},{"id":"initializingTime"},{"id":"numberOfInProgressCheckpoints"},{"id":"downtime"},{"id":"lastCheckpointProcessedData"},{"id":"numberOfCompletedCheckpoints"},{"id":"deployingTime"},{"id":"lastCheckpointFullSize"},{"id":"fullRestarts"},{"id":"createdTime"},{"id":"lastCheckpointDuration"},{"id":"lastCheckpointPersistedData"}]{noformat}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)