[ 
https://issues.apache.org/jira/browse/KUDU-2287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16437720#comment-16437720
 ] 

Todd Lipcon commented on KUDU-2287:
-----------------------------------

This would also be useful on masters. We've seen some cases where one of three 
masters is accidentally reformatted such that it can't be connected to by the 
other masters, and no one notices until one of the other masters dies. If we 
had this metric we could alert on it.

> Add replica metric tracking time since there was a valid leader
> ---------------------------------------------------------------
>
>                 Key: KUDU-2287
>                 URL: https://issues.apache.org/jira/browse/KUDU-2287
>             Project: Kudu
>          Issue Type: New Feature
>          Components: ksck, metrics, supportability
>    Affects Versions: 1.7.0
>            Reporter: Todd Lipcon
>            Assignee: Attila Bukor
>            Priority: Major
>
> Currently monitoring systems can report that the Kudu cluster is perfectly 
> healthy when in fact some tablet has gotten "stuck" with no leader (eg due to 
> some network connectivity problem or a bug). If we exposed a numeric metric 
> on a tablet indicating the time since a replica was healthy, or number of 
> failed election attempts, etc, we could easily monitor for this case.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to