[ 
https://issues.apache.org/jira/browse/AMBARI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Lysnichenko updated AMBARI-7791:
---------------------------------------
    Attachment: AMBARI-7791_branch-1.7.0.patch

> HBase Master CPU utilization alert is not suppressed at MM
> ----------------------------------------------------------
>
>                 Key: AMBARI-7791
>                 URL: https://issues.apache.org/jira/browse/AMBARI-7791
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 1.7.0
>            Reporter: Dmitry Lysnichenko
>            Assignee: Dmitry Lysnichenko
>             Fix For: 1.7.0
>
>         Attachments: AMBARI-7791_branch-1.7.0.patch
>
>
> Looks like we have a design flaw that affects suppressing some alerts. It 
> causes a rare bug that probably affects 1.6.1.
> h2. The short story
> When we put HBase Master (or entire HBase service) into MM and then stop 
> HBase Master, the alert "HBase Master CPU utilization" pops up and is not 
> suppressed. This issue reproduces only when HBase Master is located on a 
> separate host then Nagios server. 
> h2. How suppressing alerts works 
> When we put some service/host/host component into MM, at the server we build 
> a complete map of host components that are in MM and post it to an agent. 
> Agent writes down this info to file /var/nagios/ignore.dat in a form:
> {code}
> vm-3.vm GANGLIA GANGLIA_MONITOR
> vm-0.vm HBASE HBASE_MASTER
> vm-3.vm HDFS DATANODE
> vm-2.vm HBASE HBASE_REGIONSERVER
> vm-0.vm HBASE HBASE_REGIONSERVER
> vm-1.vm HBASE HBASE_REGIONSERVER
> vm-3.vm YARN NODEMANAGER
> vm-3.vm HBASE HBASE_REGIONSERVER
> {code}
> All alerts at Nagios are wrapped into shell script (check_wrapper.sh). When 
> any alert is generated, this wrapper checks  if the hostname, service name 
> and component name for this alert are present at /var/nagios/ignore.dat. If 
> yes, alert is suppressed
> h2. What exactly is broken
> At jira https://issues.apache.org/jira/browse/AMBARI-6358 we had a 
> requirement to have only one 'HBase Master CPU utilization' check even in HA 
> mode. So this check is bound to Nagios host (to be executed only once even if 
> hbase master hostgroup has more than one host, like it is done for "* Percent 
> Count" alerts). As a result, Hbase Master alert origin data does not match 
> any entry at file /var/nagios/ignore.dat . That's why the alert is not 
> suppressed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to