Jonathan Hurley created AMBARI-8569:
---------------------------------------

             Summary: Alert JSON Files Need Descriptions
                 Key: AMBARI-8569
                 URL: https://issues.apache.org/jira/browse/AMBARI-8569
             Project: Ambari
          Issue Type: Task
          Components: alerts
    Affects Versions: 2.0.0
            Reporter: Jonathan Hurley
            Assignee: Jonathan Hurley
             Fix For: 2.0.0


BUG-28018 adds a new {{description}} field to an alert definition. The 
{{alerts.json}} files for every service in every stack should be updated to 
have this field for each alert definition.




|DateNode Process | HDFS | This host-level alert is triggered if the individual 
DataNode processes cannot be established to be up and listening on the network 
for the configured critical threshold.|
|NameNode Process | HDFS | This host-level alert is triggered if the NameNode 
process cannot be confirmed to be up and listening on the network for the 
configured critical threshold.|         
|NameNode Host CPU Utilization | HDFS |This host-level alert is triggered if 
CPU utilization of the NameNode exceeds certain warning and critical 
thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad property. |
|NameNode Blocks Health | HDFS | This service-level alert is triggered if the 
number of corrupt or missing blocks exceeds the configured critical threshold.|
|DataNode Storage| HDFS | This host-level alert is triggered if storage 
capacity if full on the DataNode. It checks the DataNode JMX Servlet for the 
Capacity and Remaining properties. |
|NameNode Web UI | HDFS | This host-level alert is triggered if the NameNode 
Web UI is unreachable.|    
|Percent DataNodes With Available Space | HDFS | This service-level alert is 
triggered if the storage if full on a certain percentage of DataNodes exceed 
the warning and critical thresholds. |                
|Percent DataNodes Available | HDFS | This alert is triggered if the number of 
down DataNodes in the cluster is greater than the configured critical 
threshold. It aggregates the results of DataNode process checks.|
|NameNode RPC Latency | HDFS |his host-level alert is triggered if the NameNode 
operations RPC latency exceeds the configured critical threshold. Typically an 
increase in the RPC processing time increases the RPC queue length, causing the 
average queue wait time to increase for NameNode operations.|
|HDFS Capacity Utilization | HDFS |This service-level alert is triggered if the 
HDFS capacity utilization exceeds the configured warning and critical 
thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and 
CapacityRemaining properties.|
|DataNode Web UI | HDFS | This host-level alert is triggered if the DataNode 
Web UI is unreachable.|
|Secondary NameNode Process | HDFS | This host-level alert is triggered if the 
Secondary NameNode process cannot be confirmed to be up and listening on the 
network for the configured critical threshold.|
|JournalNode Process | HDFS |This host-level alert is triggered if the 
JournalNode process cannot be confirmed to be up and listening on the network 
for the configured critical threshold.
|ZooKeeper Failover Controller Process | HDFS | This host-level alert is 
triggered if the ZooKeeper Failover Controller process cannot be confirmed to 
be up and listening on the network for the configured critical threshold.|
|Percent JournalNodes Available | HDFS | This alert is triggered if the number 
of down JournalNodes in the cluster is greater than the configured critical 
threshold. It aggregates the results of JournalNode process checks.
|NameNode High Availability Health | HDFS | This service-level alert is 
triggered if either the Active NameNode or Standby NameNode are not running. |  
|History Server Process | MAPREDUCE2 |  This host-level alert is triggered if 
the HistoryServer process cannot be established to be up and listening on the 
network for the configured critical threshold|      
|History Server RPC Latency | MAPREDUCE2 |This host-level alert is triggered if 
the HistoryServer operations RPC latency exceeds the configured critical 
threshold. Typically an increase in the RPC processing time increases the RPC 
queue length, causing the average queue wait time to increase for operations.   
         
|History Server CPU Utilization | MAPREDUCE2 | This host-level alert is 
triggered if the percent of CPU utilization on the HistoryServer exceeds the 
configured critical threshold.|
|History Server Web UI | MAPREDUCE2 | This host-level alert is triggered if the 
HistoryServer Web UI is unreachable.  | 
|ZooKeeper Server Process | ZOOKEEPER | This host-level alert is triggered if 
the ZooKeeper server process cannot be determined to be up and listening on the 
network for the configured critical threshold.|   
|Percent ZooKeeper Servers Available | ZOOKEEPER |This service-level alert is 
triggered if the configured percentage of ZooKeeper processes cannot be 
determined to be up and listening on the network for the configured critical 
threshold. It aggregates the results of ZooKeeper process checks.|
|ResourceManager RPC Latency | YARN | This host-level alert is triggered if the 
ResourceManager operations RPC latency exceeds the configured critical 
threshold. Typically an increase in the RPC processing time increases the RPC 
queue length, causing the average queue wait time to increase for 
ResourceManager operations.|
|ResourceManager CPU Utilization | YARN | This host-level alert is triggered if 
CPU utilization of the ResourceManager exceeds certain warning and critical 
thresholds. It checks the ResourceManager JMX Servlet for the SystemCPULoad 
property.|
|NodeManager Health | YARN | This host-level alert checks the node health 
property available from the NodeManager component.|
|Percent NodeManagers Available | YARN | This alert is triggered if the number 
of down NodeManagers in the cluster is greater than the configured critical 
threshold. It aggregates the results of NodeManager process checks. |        
|ResourceManager Web UI | YARN  | This host-level alert is triggered if the 
ResourceManager Web UI is unreachable.|
|App Timeline Web UI | YARN |   This host-level alert is triggered if the App 
Timeline Server Web UI is unreachable.|
|NodeManager Web UI | YARN |This host-level alert is triggered if the 
NodeManager Web UI is unreachable.|
|NameNode Last Checkpoint | HDFS |Checks the last time that the NameNode 
performed a checkpoint. This script will also check for the number of 
uncommitted transactions.|
|NameNode Directory Status | HDFS |It checks the NameNode JMX Servlet for the 
NameDirStatuses metric to see if any directories report a failure.|

|Percent RegionServers process|HBASE|This service-level alert is triggered if 
the configured percentage of Region Server processes cannot be determined to be 
up and listening on the network for the configured warning and critical 
thresholds. It aggregates the results of RegionServer process down checks.
|Percent HBase Master process|HBASE|This alert is triggered if the HBase master 
processes cannot be confirmed to be up and listening on the network for the 
configured critical threshold, given in seconds. |
|HBase Master Web UI|HBASE|This host-level alert is triggered if the HBase 
Master Web UI is unreachable.|
|Percent HBase Master CPU utilization|HBASE|This host-level alert is triggered 
if CPU utilization of the HBase Master exceeds certain warning and critical 
thresholds. It checks the HBase Master JMX Servlet for the SystemCPULoad 
property.|
|RegionServer process|HBASE|This host-level alert is triggered if the 
RegionServer processes cannot be confirmed to be up and listening on the 
network for the configured critical threshold, given in seconds.|

|Hive Metastore status|HIVE|This host-level alert is triggered if the Hive 
Metastore process cannot be determined to be up and listening on the network 
for the configured critical threshold.|
|WebHCat Server process|HIVE|This host-level alert is triggered if the WebHCat 
server cannot be determined to be up and responding to client requests.|

|Oozie Server process|OOZIE|This host-level alert is triggered if the Oozie 
server cannot be determined to be up and responding to client requests.|

|Knox Gateway process|KNOX|This host-level alert is triggered if the Knox 
Gateway cannot be determined to be up.|
|Kafka Broker process|KAFKA|This host-level alert is triggered if the Kafka 
Broker cannot be determined to be up.|

|Falcon Server Web UI|FALCON|This host-level alert is triggered if the Falcon 
Server Web UI is unreachable.|
|Falcon Server process UI|FALCON|This host-level alert is triggered if the 
Falcon Server cannot be determined to be up.|




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to