[
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764113#comment-17764113
]
Piotr Nowojski commented on FLINK-23411:
----------------------------------------
{quote}
maybe we could still register some task level metrics like this which could be
unregistered, and it could work with OTEL.
{quote}
I think the proper solution would still need something like that, but
generically on a lower level. Even if we add tracing support, not all metric
reporters will be able to handle that. In those cases we would either:
* ignore the problem, and so be it - some metric reporters wouldn't be able to
report everything
* a partial fix - maybe Flink could convert traces into metrics {{last***}}.
Vide {{lastCheckpointDuration}}.
Anyway, I think best way would be to first think through OTEL/Traces
integration, add support for that, and then add those new checkpointing metrics
from this ticket in this new model. However it's indeed much more work
(including writing and voting on a FLIP), so I'm also fine if you would prefer
to first (hopefully temporarily) add the checkpoint metrics +/- how you are
proposing right now.
One thing is that in order to not bloat metric system too much, we should
implement this as an opt-in feature, hidden behind a feature toggle, that users
would have to manually enable in order to see those metrics.
> Expose Flink checkpoint details metrics
> ---------------------------------------
>
> Key: FLINK-23411
> URL: https://issues.apache.org/jira/browse/FLINK-23411
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Metrics
> Affects Versions: 1.13.1, 1.12.4
> Reporter: Jun Qin
> Assignee: Hangxiang Yu
> Priority: Major
> Labels: pull-request-available, stale-assigned
> Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the
> sync/async/alignment/start delay are not exposed to the metrics system. This
> makes problem investigation harder when Web UI is not enabled: those numbers
> can not get in the DEBUG logs. I think we should see how we can expose
> metrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)