[ 
https://issues.apache.org/jira/browse/AMBARI-9458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yurii Shylov updated AMBARI-9458:
---------------------------------
    Attachment: AMBARI-9458.patch

> HDFS, YARN, and HBase Slave Health Alert Definitions
> ----------------------------------------------------
>
>                 Key: AMBARI-9458
>                 URL: https://issues.apache.org/jira/browse/AMBARI-9458
>             Project: Ambari
>          Issue Type: Task
>          Components: ambari-server
>    Affects Versions: 2.0.0
>            Reporter: Yurii Shylov
>            Assignee: Yurii Shylov
>             Fix For: 2.0.0
>
>         Attachments: AMBARI-9458.patch
>
>
> When a slave component, such as a DataNode, encounters some catastrophic 
> problem like a heap allocation error, and no longer can perform its work, the 
> NameNode marks this DataNode as being unhealthy.
> The current alert definitions only check for the DataNode process being 
> alive, which is still technically is. We need to add new alert definitions 
> for:
> - HDFS/DataNode (runs on NameNode, query is to NameNode JMX)
> - YARN/NodeManager (runs on ResourceManager, query is to ResourceManager JMX)
> - HBase/RegionServer (runs on HBase Master, queries HBase Master JMX)
> Which will check for slaves that are in some sort of bad state. Depending on 
> the JMX structures that need to be queried, these can either be METRIC or 
> SCRIPT style alert definitions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to