[
https://issues.apache.org/jira/browse/AMBARI-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yurii Shylov updated AMBARI-9458:
---------------------------------
Attachment: AMBARI-9458.patch
> HDFS, YARN, and HBase Slave Health Alert Definitions
> ----------------------------------------------------
>
> Key: AMBARI-9458
> URL: https://issues.apache.org/jira/browse/AMBARI-9458
> Project: Ambari
> Issue Type: Task
> Components: ambari-server
> Affects Versions: 2.0.0
> Reporter: Yurii Shylov
> Assignee: Yurii Shylov
> Fix For: 2.0.0
>
> Attachments: AMBARI-9458.patch
>
>
> When a slave component, such as a DataNode, encounters some catastrophic
> problem like a heap allocation error, and no longer can perform its work, the
> NameNode marks this DataNode as being unhealthy.
> The current alert definitions only check for the DataNode process being
> alive, which is still technically is. We need to add new alert definitions
> for:
> - HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
> - YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
> - HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)
> Which will check for slaves that are in some sort of bad state. Depending on
> the JMX structures that need to be queried, these can either be METRIC or
> SCRIPT style alert definitions.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)