[
https://issues.apache.org/jira/browse/AMBARI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14172647#comment-14172647
]
Hadoop QA commented on AMBARI-7791:
-----------------------------------
{color:green}+1 overall{color}. Here are the results of testing the latest
attachment
http://issues.apache.org/jira/secure/attachment/12675040/AMBARI-7791.patch
against trunk revision .
{color:green}+1 @author{color}. The patch does not contain any @author
tags.
{color:green}+1 tests included{color}. The patch appears to include 5 new
or modified test files.
{color:green}+1 javac{color}. The applied patch does not increase the
total number of javac compiler warnings.
{color:green}+1 release audit{color}. The applied patch does not increase
the total number of release audit warnings.
{color:green}+1 core tests{color}. The patch passed unit tests in
ambari-server.
Test results:
https://builds.apache.org/job/Ambari-trunk-test-patch/211//testReport/
Console output:
https://builds.apache.org/job/Ambari-trunk-test-patch/211//console
This message is automatically generated.
> HBase Master CPU utilization alert is not suppressed at MM
> ----------------------------------------------------------
>
> Key: AMBARI-7791
> URL: https://issues.apache.org/jira/browse/AMBARI-7791
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 1.7.0
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Fix For: 1.7.0
>
> Attachments: AMBARI-7791.patch, AMBARI-7791_branch-1.7.0.patch
>
>
> Looks like we have a design flaw that affects suppressing some alerts. It
> causes a rare bug that probably affects 1.6.1.
> h2. The short story
> When we put HBase Master (or entire HBase service) into MM and then stop
> HBase Master, the alert "HBase Master CPU utilization" pops up and is not
> suppressed. This issue reproduces only when HBase Master is located on a
> separate host then Nagios server.
> h2. How suppressing alerts works
> When we put some service/host/host component into MM, at the server we build
> a complete map of host components that are in MM and post it to an agent.
> Agent writes down this info to file /var/nagios/ignore.dat in a form:
> {code}
> vm-3.vm GANGLIA GANGLIA_MONITOR
> vm-0.vm HBASE HBASE_MASTER
> vm-3.vm HDFS DATANODE
> vm-2.vm HBASE HBASE_REGIONSERVER
> vm-0.vm HBASE HBASE_REGIONSERVER
> vm-1.vm HBASE HBASE_REGIONSERVER
> vm-3.vm YARN NODEMANAGER
> vm-3.vm HBASE HBASE_REGIONSERVER
> {code}
> All alerts at Nagios are wrapped into shell script (check_wrapper.sh). When
> any alert is generated, this wrapper checks if the hostname, service name
> and component name for this alert are present at /var/nagios/ignore.dat. If
> yes, alert is suppressed
> h2. What exactly is broken
> At jira https://issues.apache.org/jira/browse/AMBARI-6358 we had a
> requirement to have only one 'HBase Master CPU utilization' check even in HA
> mode. So this check is bound to Nagios host (to be executed only once even if
> hbase master hostgroup has more than one host, like it is done for "* Percent
> Count" alerts). As a result, Hbase Master alert origin data does not match
> any entry at file /var/nagios/ignore.dat . That's why the alert is not
> suppressed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)