[
https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104971#comment-13104971
]
Todd Lipcon commented on HBASE-4393:
------------------------------------
RPC sampling on the server side won't tell you if, for example, one of the
servers in the cluster has a faulty NIC and thus is dropping packets and has
very high latency. The latency "inside" the server will be fast, but for any
clients, it will be slow.
Availability-wise, we sometimes have clusters which only sporadically see
access (eg from an MR job that runs every hour). In that case, it's nice to
have a canary monitor to determine if one of the region servers is having
issues _before_ the job runs and times out. We often find out about these kind
of issues from a job failing, instead of proactively from monitoring, since all
of the servers are "up", just one region in some kind of broken state.
> Implement a canary monitoring program
> -------------------------------------
>
> Key: HBASE-4393
> URL: https://issues.apache.org/jira/browse/HBASE-4393
> Project: HBase
> Issue Type: New Feature
> Components: monitoring
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Assignee: Todd Lipcon
>
> This JIRA is to implement a standalone program that can be used to do "canary
> monitoring" of a running HBase cluster. This program would gather a list of
> the regions in the cluster, then iterate over them doing lightweight
> operations (eg short scans) to provide metrics about latency as well as alert
> on availability issues.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira