-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/30566/
-----------------------------------------------------------
Review request for Ambari, Jonathan Robie and Srimanth Gunturi.
Bugs: AMBARI-9458
https://issues.apache.org/jira/browse/AMBARI-9458
Repository: ambari
Description
-------
When a slave component, such as a DataNode, encounters some catastrophic
problem like a heap allocation error, and no longer can perform its work, the
NameNode marks this DataNode as being unhealthy.
The current alert definitions only check for the DataNode process being alive,
which is still technically is. We need to add new alert definitions for:
- HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
- YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
- HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)
Which will check for slaves that are in some sort of bad state. Depending on
the JMX structures that need to be queried, these can either be METRIC or
SCRIPT style alert definitions.
Diffs
-----
ambari-server/src/main/resources/common-services/HBASE/0.96.0.2.0/alerts.json
fa911e1
ambari-server/src/main/resources/common-services/HDFS/2.1.0.2.0/alerts.json
b8a20ac
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json
dc4fafd
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
PRE-CREATION
Diff: https://reviews.apache.org/r/30566/diff/
Testing
-------
In progress
Thanks,
Yurii Shylov