[ 
https://issues.apache.org/jira/browse/FLINK-12373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321512#comment-17321512
 ] 

Flink Jira Bot commented on FLINK-12373:
----------------------------------------

This issue and all of its Sub-Tasks have not been updated for 180 days. So, it 
has been labeled "stale-minor". If you are still affected by this bug or are 
still interested in this issue, please give an update and remove the label. In 
7 days the issue will be closed automatically.

> Improve checkpointing metrics
> -----------------------------
>
>                 Key: FLINK-12373
>                 URL: https://issues.apache.org/jira/browse/FLINK-12373
>             Project: Flink
>          Issue Type: New Feature
>          Components: Runtime / Checkpointing
>            Reporter: Gyula Fora
>            Priority: Minor
>              Labels: stale-minor
>
> The checkpoint metrics encapsulated in the CheckpointMetrics class currently 
> exposes 4 core metrics for each operator: bytesBuffered, alignment time, sync 
> duration and async duration
> I think it would be a great improvement to break up the tracking of the sync 
> duration into the different components as it contains information that is 
> critical to improve the SLA of large jobs.
> I suggest we break up the sync duration into 4 subcomponents:
>  1. prepareSnapshotPreBarrier
>  2. Snapshot timers
>  3. Snapshot operator states
>  4. Sync keyed state checkpoint
> Maybe the operator state part could be further broken up into keyed/non-keyed 
> part, i dont know.
> I think knowing these metrics is crucial for users to minimise the latency 
> caused by checkpointing.
> Whether we want to show all this info on the web ui is another discussion :)
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to