[
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17651915#comment-17651915
]
Hangxiang Yu commented on FLINK-23411:
--------------------------------------
I think it's helpful for users to find the reason of checkpoint problem after
the job fails or stops.
BTW, there are alignment / start delay metrics in task scope currently [1], so
I think we could just add e2e duration / checkpointed data size / full
checkpoint data size / sync duration / async duration in the task scope.
[1]https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/ops/metrics/#checkpointing
> Expose Flink checkpoint details metrics
> ---------------------------------------
>
> Key: FLINK-23411
> URL: https://issues.apache.org/jira/browse/FLINK-23411
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics
> Affects Versions: 1.13.1, 1.12.4
> Reporter: Jun Qin
> Priority: Major
>
> The checkpoint metrics as shown in the Flink Web UI like the
> sync/async/alignment/start delay are not exposed to the metrics system. This
> makes problem investigation harder when Web UI is not enabled: those numbers
> can not get in the DEBUG logs. I think we should see how we can expose
> metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)