[
https://issues.apache.org/jira/browse/FLINK-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15473745#comment-15473745
]
ASF GitHub Bot commented on FLINK-3660:
---------------------------------------
Github user zentol commented on a diff in the pull request:
https://github.com/apache/flink/pull/2386#discussion_r77996152
--- Diff: docs/monitoring/metrics.md ---
@@ -475,52 +474,76 @@ Flink exposes the following system metrics:
<td></td>
</tr>
<tr>
- <tr>
- <th rowspan="7"><strong>Task</strong></t>
- <td>currentLowWatermark</td>
- <td>The lowest watermark a task has received.</td>
- </tr>
- <tr>
- <td>lastCheckpointDuration</td>
- <td>The time it took to complete the last checkpoint.</td>
- </tr>
- <tr>
- <td>lastCheckpointSize</td>
- <td>The total size of the last checkpoint.</td>
- </tr>
- <tr>
- <td>restartingTime</td>
- <td>The time it took to restart the job.</td>
- </tr>
- <tr>
- <td>numBytesInLocal</td>
- <td>The total number of bytes this task has read from a local
source.</td>
- </tr>
- <tr>
- <td>numBytesInRemote</td>
- <td>The total number of bytes this task has read from a remote
source.</td>
- </tr>
- <tr>
- <td>numBytesOut</td>
- <td>The total number of bytes this task has emitted.</td>
- </tr>
- </tr>
- <tr>
- <tr>
- <th rowspan="3"><strong>Operator</strong></th>
- <td>numRecordsIn</td>
- <td>The total number of records this operator has received.</td>
- </tr>
- <tr>
- <td>numRecordsOut</td>
- <td>The total number of records this operator has emitted.</td>
- </tr>
- <tr>
- <td>numSplitsProcessed</td>
- <td>The total number of InputSplits this data source has
processed.</td>
- </tr>
+ <th rowspan="7"><strong>Task</strong></th>
+ <td>currentLowWatermark</td>
+ <td>The lowest watermark a task has received.</td>
+ </tr>
+ <tr>
+ <td>lastCheckpointDuration</td>
+ <td>The time it took to complete the last checkpoint.</td>
+ </tr>
+ <tr>
+ <td>lastCheckpointSize</td>
+ <td>The total size of the last checkpoint.</td>
+ </tr>
+ <tr>
+ <td>restartingTime</td>
+ <td>The time it took to restart the job.</td>
+ </tr>
+ <tr>
+ <td>numBytesInLocal</td>
+ <td>The total number of bytes this task has read from a local
source.</td>
+ </tr>
+ <tr>
+ <td>numBytesInRemote</td>
+ <td>The total number of bytes this task has read from a remote
source.</td>
+ </tr>
+ <tr>
+ <td>numBytesOut</td>
+ <td>The total number of bytes this task has emitted.</td>
+ </tr>
+ <tr>
+ <th rowspan="4"><strong>Operator</strong></th>
+ <td>numRecordsIn</td>
+ <td>The total number of records this operator has received.</td>
+ </tr>
+ <tr>
+ <td>numRecordsOut</td>
+ <td>The total number of records this operator has emitted.</td>
+ </tr>
+ <tr>
+ <td>numSplitsProcessed</td>
+ <td>The total number of InputSplits this data source has processed
(if the operator is a data source).</td>
+ </tr>
+ <tr>
+ <td>latency</td>
+ <td>A latency gauge reporting the latency distribution from the
different sources.</td>
</tr>
</tbody>
</table>
+
+### Latency tracking
+
+Flink allows to track the latency of records traveling through the system.
To enable the latency tracking
+a `latencyTrackingInterval` (in milliseconds) has to be set to a positive
value in the `ExecutionConfig`.
+
+At the `latencyTrackingInterval`, the sources will periodically emit a
special record, called a `LatencyMaker`.
--- End diff --
LatencyMaker -> LatencyMarker
> Measure latency of elements and expose it through web interface
> ---------------------------------------------------------------
>
> Key: FLINK-3660
> URL: https://issues.apache.org/jira/browse/FLINK-3660
> Project: Flink
> Issue Type: Sub-task
> Components: Streaming
> Reporter: Robert Metzger
> Assignee: Robert Metzger
> Fix For: pre-apache
>
>
> It would be nice to expose the end-to-end latency of a streaming job in the
> webinterface.
> To achieve this, my initial thought was to attach an ingestion-time timestamp
> at the sources to each record.
> However, this introduces overhead for a monitoring feature users might not
> even use (8 bytes for each element + System.currentTimeMilis() on each
> element).
> Therefore, I suggest to implement this feature by periodically sending
> special events, similar to watermarks through the topology.
> Those {{LatencyMarks}} are emitted at a configurable interval at the sources
> and forwarded by the tasks. The sinks will compare the timestamp of the
> latency marks with their current system time to determine the latency.
> The latency marks will not add to the latency of a job, but the marks will be
> delayed similarly than regular records, so their latency will approximate the
> record latency.
> Above suggestion expects the clocks on all taskmanagers to be in sync.
> Otherwise, the measured latencies would also include the offsets between the
> taskmanager's clocks.
> In a second step, we can try to mitigate the issue by using the JobManager as
> a central timing service. The TaskManagers will periodically query the JM for
> the current time in order to determine the offset with their clock.
> This offset would still include the network latency between TM and JM but it
> would still lead to reasonably good estimations.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)