Piggyback basic "alarm" framework on RS heartbeats
--------------------------------------------------

                 Key: HBASE-2629
                 URL: https://issues.apache.org/jira/browse/HBASE-2629
             Project: HBase
          Issue Type: New Feature
          Components: master, regionserver
            Reporter: Todd Lipcon


There are a number of system conditions that can cause HBase to perform badly 
or have stability issues. For example, significant swapping activity or 
overloaded ZK will result in all kinds of problems.

It would be nice to put a very lightweight "alarm" framework in place, so that 
when the RS notices something is amiss, it can raise an alarm flag for some 
period of time. These could be exposed by JMX to external monitoring tools, and 
also displayed on the master web UI.

Some example alarms:
- "ZK read took >1000ms"
- "Long garbage collection pause detected"
- "Writes blocked on region for longer than 5 seconds"
etc etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to