[
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764038#comment-17764038
]
Hangxiang Yu commented on FLINK-23411:
--------------------------------------
Hi, [~pnowojski] , Thanks for picking this up.
I think it's indeed a problem that all task level metrics have, and
checkpoint-related metrics makes it more obvious which is related to checkpoint
duration.
[distributed
tracing|https://newrelic.com/blog/how-to-relic/distributed-tracing-anomaly-detection]
and OTEL sound an intersting idea, maybe we could still register some task
level metrics like this which could be unregistered, and it could work with
OTEL.
It's fine for me to resolve FLINK-33071 firstly.
> Expose Flink checkpoint details metrics
> ---------------------------------------
>
> Key: FLINK-23411
> URL: https://issues.apache.org/jira/browse/FLINK-23411
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics
> Affects Versions: 1.13.1, 1.12.4
> Reporter: Jun Qin
> Assignee: Hangxiang Yu
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the
> sync/async/alignment/start delay are not exposed to the metrics system. This
> makes problem investigation harder when Web UI is not enabled: those numbers
> can not get in the DEBUG logs. I think we should see how we can expose
> metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)