[ 
https://issues.apache.org/jira/browse/FLINK-8043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16334640#comment-16334640
 ] 

Steven Zhen Wu commented on FLINK-8043:
---------------------------------------

[~till.rohrmann] thx for the explanation. let me close this jira.

We need to be able to set up an alert if global recoveries or task failures are 
over certain threshold. So we would need to convert "fullRestarts" from Gauge 
to Counter at our end. 

> change fullRestarts (for fine grained recovery) from guage to counter
> ---------------------------------------------------------------------
>
>                 Key: FLINK-8043
>                 URL: https://issues.apache.org/jira/browse/FLINK-8043
>             Project: Flink
>          Issue Type: Bug
>          Components: ResourceManager
>    Affects Versions: 1.3.2
>            Reporter: Steven Zhen Wu
>            Priority: Blocker
>             Fix For: 1.5.0, 1.4.1
>
>
> Fine grained recovery publish fullRestarts as guage, which is not suitable 
> for threshold based alerting. Usually we would alert like "fullRestarts > 0 
> happens 10 times in last 15 minutes".
> In comparison, "task_failures" is published as counter.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to