[ 
https://issues.apache.org/jira/browse/FLINK-23411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764113#comment-17764113
 ] 

Piotr Nowojski commented on FLINK-23411:
----------------------------------------

{quote}
maybe we could still register some task level metrics like this which could be 
unregistered, and it could work with OTEL.
{quote}
I think the proper solution would still need something like that, but 
generically on a lower level. Even if we add tracing support, not all metric 
reporters will be able to handle that. In those cases we would either:
* ignore the problem, and so be it - some metric reporters wouldn't be able to 
report everything
* a partial fix - maybe Flink could convert traces into metrics {{last***}}. 
Vide {{lastCheckpointDuration}}.

Anyway, I think best way would be to first think through OTEL/Traces 
integration, add support for that, and then add those new checkpointing metrics 
from this ticket in this new model. However it's indeed much more work 
(including writing and voting on a FLIP), so I'm also fine if you would prefer 
to first (hopefully temporarily) add the checkpoint metrics +/- how you are 
proposing right now.

One thing is that in order to not bloat metric system too much, we should 
implement this as an opt-in feature, hidden behind a feature toggle, that users 
would have to manually enable in order to see those metrics.

> Expose Flink checkpoint details metrics
> ---------------------------------------
>
>                 Key: FLINK-23411
>                 URL: https://issues.apache.org/jira/browse/FLINK-23411
>             Project: Flink
>          Issue Type: Improvement
>          Components: Runtime / Metrics
>    Affects Versions: 1.13.1, 1.12.4
>            Reporter: Jun Qin
>            Assignee: Hangxiang Yu
>            Priority: Major
>              Labels: pull-request-available, stale-assigned
>             Fix For: 1.18.0
>
>
> The checkpoint metrics as shown in the Flink Web UI like the 
> sync/async/alignment/start delay are not exposed to the metrics system. This 
> makes problem investigation harder when Web UI is not enabled: those numbers 
> can not get in the DEBUG logs. I think we should see how we can expose 
> metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to