[ 
https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15943620#comment-15943620
 ] 

Andrew Wong commented on KUDU-1506:
-----------------------------------

I added a follower lag metric that measures lag in number of operations, rather 
than time (see e4495d662fff444db646f1e5f67cf52c1d901e8b). Will leave this open 
in case we want to add a time metric.

> Add Consensus "follower lag" metrics
> ------------------------------------
>
>                 Key: KUDU-1506
>                 URL: https://issues.apache.org/jira/browse/KUDU-1506
>             Project: Kudu
>          Issue Type: New Feature
>          Components: consensus, metrics
>    Affects Versions: 0.9.0
>            Reporter: Mike Percy
>            Assignee: Andrew Wong
>
> It would be useful to have metrics that measured the lag time between leader 
> WAL writes and follower WAL writes. Imagine if a node on a cluster had a very 
> slow disk or was extremely overloaded. That node may constantly be falling 
> behind and/or remote bootstrapping. It would help to be able to monitor for 
> nodes that were constantly very far behind the leader (high seconds or 
> minutes) so that administrators could take a look at these slow machines and 
> either remove them from the cluster or fix the underlying issues.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to