[ 
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651915#comment-17651915
 ] 

Hangxiang Yu commented on FLINK-23411:
--------------------------------------

I think it's helpful for users to find the reason of checkpoint problem after 
the job fails or stops.
BTW, there are alignment / start delay metrics in task scope currently [1], so 
I think we could just add e2e duration / checkpointed data size / full 
checkpoint data size / sync duration / async duration in the task scope. 

[1]https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/metrics/#checkpointing

> Expose Flink checkpoint details metrics
> ---------------------------------------
>
>                 Key: FLINK-23411
>                 URL: https://issues.apache.org/jira/browse/FLINK-23411
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: Jun Qin
>            Priority: Major
>
> The checkpoint metrics as shown in the Flink Web UI like the 
> sync/async/alignment/start delay are not exposed to the metrics system. This 
> makes problem investigation harder when Web UI is not enabled: those numbers 
> can not get in the DEBUG logs. I think we should see how we can expose 
> metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to