[
https://issues.apache.org/jira/browse/KUDU-1506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15924872#comment-15924872
]
Alexey Serbin commented on KUDU-1506:
-------------------------------------
Some more information on this: users might be interested to observe how long it
takes for leader-follower replication events and possibly alert on a certain
threshold. The alert part could be implemented by some external monitoring
system; from the Kudu side it would be enough to provide the 'follower lag'
metric as is.
> Add Consensus "follower lag" metrics
> ------------------------------------
>
> Key: KUDU-1506
> URL: https://issues.apache.org/jira/browse/KUDU-1506
> Project: Kudu
> Issue Type: New Feature
> Components: consensus, metrics
> Affects Versions: 0.9.0
> Reporter: Mike Percy
>
> It would be useful to have metrics that measured the lag time between leader
> WAL writes and follower WAL writes. Imagine if a node on a cluster had a very
> slow disk or was extremely overloaded. That node may constantly be falling
> behind and/or remote bootstrapping. It would help to be able to monitor for
> nodes that were constantly very far behind the leader (high seconds or
> minutes) so that administrators could take a look at these slow machines and
> either remove them from the cluster or fix the underlying issues.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)