AHeise commented on a change in pull request #10931: [FLINK-15603][metrics] Add
checkpointStartDelay metric
URL: https://github.com/apache/flink/pull/10931#discussion_r370522304
##########
File path: docs/monitoring/metrics.md
##########
@@ -1341,11 +1341,16 @@ Metrics related to data exchange between task
executors using netty network comm
<td>Gauge</td>
</tr>
<tr>
- <th rowspan="1">Task</th>
+ <th rowspan="2">Task</th>
<td>checkpointAlignmentTime</td>
<td>The time in nanoseconds that the last barrier alignment took to
complete, or how long the current alignment has taken so far (in
nanoseconds).</td>
<td>Gauge</td>
</tr>
+ <tr>
+ <td>checkpointStartDelay</td>
+ <td>The time in nanoseconds that elapsed between the creation of the
last checkpoint and the time when the checkpointing process has started by this
Task. This delay shows how long it takes for a first checkpoint barrier to
reach the task. Back-pressure will increase this value.</td>
Review comment:
Just to make sure that this was not just an oversight:
I also suggested to add "A high value indicates back-pressure. If only a
specific task has a long start delay, the most likely reason is data skew."
instead of "Back-pressure will increase this value."
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services