[
https://issues.apache.org/jira/browse/AMBARI-7791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14174608#comment-14174608
]
Hudson commented on AMBARI-7791:
--------------------------------
FAILURE: Integrated in Ambari-trunk-Commit-docker #6 (See
[https://builds.apache.org/job/Ambari-trunk-Commit-docker/6/])
AMBARI-7791. HBase Master CPU utilization alert is not suppressed at MM
(dlysnichenko) (dlysnichenko:
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=7092d80d32a39e80b4f0d71c645a48f9a2090889)
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/files/mm_wrapper.py
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/files/check_wrapper.sh
* ambari-server/src/test/python/unitTests.py
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/nagios.cfg.j2
* ambari-server/src/test/python/stacks/1.3.2/NAGIOS/test_mm_wrapper.py
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/templates/nagios.cfg.j2
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/files/mm_wrapper.py
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/templates/hadoop-commands.cfg.j2
* ambari-server/src/test/python/stacks/2.0.6/NAGIOS/test_mm_wrapper.py
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/scripts/nagios_server_config.py
* ambari-server/src/test/python/stacks/1.3.2/NAGIOS/test_nagios_server.py
* ambari-server/src/main/python/ambari-server.py
* ambari-server/src/test/python/stacks/2.0.6/NAGIOS/test_nagios_server.py
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/files/check_checkpoint_time.py
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/files/check_wrapper.sh
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/hadoop-services.cfg.j2
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/templates/hadoop-commands.cfg.j2
*
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/NAGIOS/package/scripts/nagios_server_config.py
*
ambari-server/src/main/resources/stacks/HDP/1.3.2/services/NAGIOS/package/templates/hadoop-services.cfg.j2
> HBase Master CPU utilization alert is not suppressed at MM
> ----------------------------------------------------------
>
> Key: AMBARI-7791
> URL: https://issues.apache.org/jira/browse/AMBARI-7791
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 1.7.0
> Reporter: Dmitry Lysnichenko
> Assignee: Dmitry Lysnichenko
> Fix For: 1.7.0
>
> Attachments: AMBARI-7791.patch, AMBARI-7791.patch.1,
> AMBARI-7791.patch.2, AMBARI-7791_branch-1.7.0.patch,
> AMBARI-7791_branch-1.7.0.patch.1, AMBARI-7791_branch-1.7.0.patch.2
>
>
> Looks like we have a design flaw that affects suppressing some alerts. It
> causes a rare bug that probably affects 1.6.1.
> h2. The short story
> When we put HBase Master (or entire HBase service) into MM and then stop
> HBase Master, the alert "HBase Master CPU utilization" pops up and is not
> suppressed. This issue reproduces only when HBase Master is located on a
> separate host then Nagios server.
> h2. How suppressing alerts works
> When we put some service/host/host component into MM, at the server we build
> a complete map of host components that are in MM and post it to an agent.
> Agent writes down this info to file /var/nagios/ignore.dat in a form:
> {code}
> vm-3.vm GANGLIA GANGLIA_MONITOR
> vm-0.vm HBASE HBASE_MASTER
> vm-3.vm HDFS DATANODE
> vm-2.vm HBASE HBASE_REGIONSERVER
> vm-0.vm HBASE HBASE_REGIONSERVER
> vm-1.vm HBASE HBASE_REGIONSERVER
> vm-3.vm YARN NODEMANAGER
> vm-3.vm HBASE HBASE_REGIONSERVER
> {code}
> All alerts at Nagios are wrapped into shell script (check_wrapper.sh). When
> any alert is generated, this wrapper checks if the hostname, service name
> and component name for this alert are present at /var/nagios/ignore.dat. If
> yes, alert is suppressed
> h2. What exactly is broken
> At jira https://issues.apache.org/jira/browse/AMBARI-6358 we had a
> requirement to have only one 'HBase Master CPU utilization' check even in HA
> mode. So this check is bound to Nagios host (to be executed only once even if
> hbase master hostgroup has more than one host, like it is done for "* Percent
> Count" alerts). As a result, Hbase Master alert origin data does not match
> any entry at file /var/nagios/ignore.dat . That's why the alert is not
> suppressed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)