Piggyback basic "alarm" framework on RS heartbeats
--------------------------------------------------
Key: HBASE-2629
URL: https://issues.apache.org/jira/browse/HBASE-2629
Project: HBase
Issue Type: New Feature
Components: master, regionserver
Reporter: Todd Lipcon
There are a number of system conditions that can cause HBase to perform badly
or have stability issues. For example, significant swapping activity or
overloaded ZK will result in all kinds of problems.
It would be nice to put a very lightweight "alarm" framework in place, so that
when the RS notices something is amiss, it can raise an alarm flag for some
period of time. These could be exposed by JMX to external monitoring tools, and
also displayed on the master web UI.
Some example alarms:
- "ZK read took >1000ms"
- "Long garbage collection pause detected"
- "Writes blocked on region for longer than 5 seconds"
etc etc
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.