[ 
https://issues.apache.org/jira/browse/HBASE-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104971#comment-13104971
 ] 

Todd Lipcon commented on HBASE-4393:
------------------------------------

RPC sampling on the server side won't tell you if, for example, one of the 
servers in the cluster has a faulty NIC and thus is dropping packets and has 
very high latency. The latency "inside" the server will be fast, but for any 
clients, it will be slow.

Availability-wise, we sometimes have clusters which only sporadically see 
access (eg from an MR job that runs every hour). In that case, it's nice to 
have a canary monitor to determine if one of the region servers is having 
issues _before_ the job runs and times out. We often find out about these kind 
of issues from a job failing, instead of proactively from monitoring, since all 
of the servers are "up", just one region in some kind of broken state.

> Implement a canary monitoring program
> -------------------------------------
>
>                 Key: HBASE-4393
>                 URL: https://issues.apache.org/jira/browse/HBASE-4393
>             Project: HBase
>          Issue Type: New Feature
>          Components: monitoring
>    Affects Versions: 0.92.0
>            Reporter: Todd Lipcon
>            Assignee: Todd Lipcon
>
> This JIRA is to implement a standalone program that can be used to do "canary 
> monitoring" of a running HBase cluster. This program would gather a list of 
> the regions in the cluster, then iterate over them doing lightweight 
> operations (eg short scans) to provide metrics about latency as well as alert 
> on availability issues.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to