-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27582/
-----------------------------------------------------------
Review request for Ambari, Nate Cole and Tom Beerbower.
Bugs: AMBARI-8143
https://issues.apache.org/jira/browse/AMBARI-8143
Repository: ambari
Description
-------
The NameNode HA Health Check is unique in that is requires knowledge of both
states of the active and passive NN in order to make the correct alert state
calculation. It also doesn't need to run on every host that's a NAMENODE
component.
This presents a problem for alerts as there is no way to say, "Run this alert,
but only on one host, not both". It's also a problem because if the host you
want to run it on goes down, the alert won't run. And finally, it's a problem
because if you run the alert on host1 and then that host fails and host2 takes
over, the alert appears to be from another host and does not replace the
original alert.
To solve this, the following changes were made:
- SCRIPT alerts can now return a status of SKIPPED, meaning that they ran
successfully but don't need to report back their status to the Ambari server;
nothing from them will get put into the agent's alert collector.
- Alert definitions have a new property to ignore the hosts that the alert
instances are originating from. This allows any host to run the alert and
report back to Ambari, but the server will collapse these into a single current
alert; there won't be multiple history items either
Diffs
-----
ambari-agent/src/main/python/ambari_agent/alerts/base_alert.py d93ec48
ambari-agent/src/main/python/ambari_agent/alerts/script_alert.py 12d0d2a
ambari-agent/src/test/python/ambari_agent/TestAlerts.py 1f8d0c0
ambari-agent/src/test/python/ambari_agent/dummy_files/test_script.py 278c26c
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProvider.java
86e7b7e
ambari-server/src/main/java/org/apache/ambari/server/events/listeners/AlertReceivedListener.java
bcbe823
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/AlertDefinitionEntity.java
6374342
ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinition.java
961fb66
ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinitionFactory.java
cd937ef
ambari-server/src/main/java/org/apache/ambari/server/upgrade/SchemaUpgradeHelper.java
e1d5dca
ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog200.java
PRE-CREATION
ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1b16c2f
ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql ef7b564
ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 18fe6d4
ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql
fa131fd
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/alerts.json
a409230
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/files/alert_ha_namenode_health.py
PRE-CREATION
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProviderTest.java
7823994
ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog200Test.java
PRE-CREATION
Diff: https://reviews.apache.org/r/27582/diff/
Testing
-------
Tested on clusters with both HA disabled and enabled. When enabled, verified
that failing different instances of the NameNodes had the correct affect on the
alert:
"state" : "CRITICAL",
"text" : "Active[], Standby['c6402.ambari.apache.org:50070'],
Unknown['c6401.ambari.apache.org:50070']"
"state" : "CRITICAL",
"text" : "Active['c6402.ambari.apache.org:50070'], Standby[],
Unknown['c6401.ambari.apache.org:50070']"
"state" : "OK",
"text" : "Active['c6402.ambari.apache.org:50070'],
Standby['c6401.ambari.apache.org:50070'], Unknown[]"
New tests added as well...
Thanks,
Jonathan Hurley