Jonathan Hurley created AMBARI-8569:
---------------------------------------
Summary: Alert JSON Files Need Descriptions
Key: AMBARI-8569
URL: https://issues.apache.org/jira/browse/AMBARI-8569
Project: Ambari
Issue Type: Task
Components: alerts
Affects Versions: 2.0.0
Reporter: Jonathan Hurley
Assignee: Jonathan Hurley
Fix For: 2.0.0
BUG-28018 adds a new {{description}} field to an alert definition. The
{{alerts.json}} files for every service in every stack should be updated to
have this field for each alert definition.
|DateNode Process | HDFS | This host-level alert is triggered if the individual
DataNode processes cannot be established to be up and listening on the network
for the configured critical threshold.|
|NameNode Process | HDFS | This host-level alert is triggered if the NameNode
process cannot be confirmed to be up and listening on the network for the
configured critical threshold.|
|NameNode Host CPU Utilization | HDFS |This host-level alert is triggered if
CPU utilization of the NameNode exceeds certain warning and critical
thresholds. It checks the NameNode JMX Servlet for the SystemCPULoad property. |
|NameNode Blocks Health | HDFS | This service-level alert is triggered if the
number of corrupt or missing blocks exceeds the configured critical threshold.|
|DataNode Storage| HDFS | This host-level alert is triggered if storage
capacity if full on the DataNode. It checks the DataNode JMX Servlet for the
Capacity and Remaining properties. |
|NameNode Web UI | HDFS | This host-level alert is triggered if the NameNode
Web UI is unreachable.|
|Percent DataNodes With Available Space | HDFS | This service-level alert is
triggered if the storage if full on a certain percentage of DataNodes exceed
the warning and critical thresholds. |
|Percent DataNodes Available | HDFS | This alert is triggered if the number of
down DataNodes in the cluster is greater than the configured critical
threshold. It aggregates the results of DataNode process checks.|
|NameNode RPC Latency | HDFS |his host-level alert is triggered if the NameNode
operations RPC latency exceeds the configured critical threshold. Typically an
increase in the RPC processing time increases the RPC queue length, causing the
average queue wait time to increase for NameNode operations.|
|HDFS Capacity Utilization | HDFS |This service-level alert is triggered if the
HDFS capacity utilization exceeds the configured warning and critical
thresholds. It checks the NameNode JMX Servlet for the CapacityUsed and
CapacityRemaining properties.|
|DataNode Web UI | HDFS | This host-level alert is triggered if the DataNode
Web UI is unreachable.|
|Secondary NameNode Process | HDFS | This host-level alert is triggered if the
Secondary NameNode process cannot be confirmed to be up and listening on the
network for the configured critical threshold.|
|JournalNode Process | HDFS |This host-level alert is triggered if the
JournalNode process cannot be confirmed to be up and listening on the network
for the configured critical threshold.
|ZooKeeper Failover Controller Process | HDFS | This host-level alert is
triggered if the ZooKeeper Failover Controller process cannot be confirmed to
be up and listening on the network for the configured critical threshold.|
|Percent JournalNodes Available | HDFS | This alert is triggered if the number
of down JournalNodes in the cluster is greater than the configured critical
threshold. It aggregates the results of JournalNode process checks.
|NameNode High Availability Health | HDFS | This service-level alert is
triggered if either the Active NameNode or Standby NameNode are not running. |
|History Server Process | MAPREDUCE2 | This host-level alert is triggered if
the HistoryServer process cannot be established to be up and listening on the
network for the configured critical threshold|
|History Server RPC Latency | MAPREDUCE2 |This host-level alert is triggered if
the HistoryServer operations RPC latency exceeds the configured critical
threshold. Typically an increase in the RPC processing time increases the RPC
queue length, causing the average queue wait time to increase for operations.
|History Server CPU Utilization | MAPREDUCE2 | This host-level alert is
triggered if the percent of CPU utilization on the HistoryServer exceeds the
configured critical threshold.|
|History Server Web UI | MAPREDUCE2 | This host-level alert is triggered if the
HistoryServer Web UI is unreachable. |
|ZooKeeper Server Process | ZOOKEEPER | This host-level alert is triggered if
the ZooKeeper server process cannot be determined to be up and listening on the
network for the configured critical threshold.|
|Percent ZooKeeper Servers Available | ZOOKEEPER |This service-level alert is
triggered if the configured percentage of ZooKeeper processes cannot be
determined to be up and listening on the network for the configured critical
threshold. It aggregates the results of ZooKeeper process checks.|
|ResourceManager RPC Latency | YARN | This host-level alert is triggered if the
ResourceManager operations RPC latency exceeds the configured critical
threshold. Typically an increase in the RPC processing time increases the RPC
queue length, causing the average queue wait time to increase for
ResourceManager operations.|
|ResourceManager CPU Utilization | YARN | This host-level alert is triggered if
CPU utilization of the ResourceManager exceeds certain warning and critical
thresholds. It checks the ResourceManager JMX Servlet for the SystemCPULoad
property.|
|NodeManager Health | YARN | This host-level alert checks the node health
property available from the NodeManager component.|
|Percent NodeManagers Available | YARN | This alert is triggered if the number
of down NodeManagers in the cluster is greater than the configured critical
threshold. It aggregates the results of NodeManager process checks. |
|ResourceManager Web UI | YARN | This host-level alert is triggered if the
ResourceManager Web UI is unreachable.|
|App Timeline Web UI | YARN | This host-level alert is triggered if the App
Timeline Server Web UI is unreachable.|
|NodeManager Web UI | YARN |This host-level alert is triggered if the
NodeManager Web UI is unreachable.|
|NameNode Last Checkpoint | HDFS |Checks the last time that the NameNode
performed a checkpoint. This script will also check for the number of
uncommitted transactions.|
|NameNode Directory Status | HDFS |It checks the NameNode JMX Servlet for the
NameDirStatuses metric to see if any directories report a failure.|
|Percent RegionServers process|HBASE|This service-level alert is triggered if
the configured percentage of Region Server processes cannot be determined to be
up and listening on the network for the configured warning and critical
thresholds. It aggregates the results of RegionServer process down checks.
|Percent HBase Master process|HBASE|This alert is triggered if the HBase master
processes cannot be confirmed to be up and listening on the network for the
configured critical threshold, given in seconds. |
|HBase Master Web UI|HBASE|This host-level alert is triggered if the HBase
Master Web UI is unreachable.|
|Percent HBase Master CPU utilization|HBASE|This host-level alert is triggered
if CPU utilization of the HBase Master exceeds certain warning and critical
thresholds. It checks the HBase Master JMX Servlet for the SystemCPULoad
property.|
|RegionServer process|HBASE|This host-level alert is triggered if the
RegionServer processes cannot be confirmed to be up and listening on the
network for the configured critical threshold, given in seconds.|
|Hive Metastore status|HIVE|This host-level alert is triggered if the Hive
Metastore process cannot be determined to be up and listening on the network
for the configured critical threshold.|
|WebHCat Server process|HIVE|This host-level alert is triggered if the WebHCat
server cannot be determined to be up and responding to client requests.|
|Oozie Server process|OOZIE|This host-level alert is triggered if the Oozie
server cannot be determined to be up and responding to client requests.|
|Knox Gateway process|KNOX|This host-level alert is triggered if the Knox
Gateway cannot be determined to be up.|
|Kafka Broker process|KAFKA|This host-level alert is triggered if the Kafka
Broker cannot be determined to be up.|
|Falcon Server Web UI|FALCON|This host-level alert is triggered if the Falcon
Server Web UI is unreachable.|
|Falcon Server process UI|FALCON|This host-level alert is triggered if the
Falcon Server cannot be determined to be up.|
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)