----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27582/#review59838 -----------------------------------------------------------
Ship it! Ship It! - Nate Cole On Nov. 4, 2014, 2:19 p.m., Jonathan Hurley wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27582/ > ----------------------------------------------------------- > > (Updated Nov. 4, 2014, 2:19 p.m.) > > > Review request for Ambari, Nate Cole and Tom Beerbower. > > > Bugs: AMBARI-8143 > https://issues.apache.org/jira/browse/AMBARI-8143 > > > Repository: ambari > > > Description > ------- > > The NameNode HA Health Check is unique in that is requires knowledge of both > states of the active and passive NN in order to make the correct alert state > calculation. It also doesn't need to run on every host that's a NAMENODE > component. > > This presents a problem for alerts as there is no way to say, "Run this > alert, but only on one host, not both". It's also a problem because if the > host you want to run it on goes down, the alert won't run. And finally, it's > a problem because if you run the alert on host1 and then that host fails and > host2 takes over, the alert appears to be from another host and does not > replace the original alert. > > To solve this, the following changes were made: > > - SCRIPT alerts can now return a status of SKIPPED, meaning that they ran > successfully but don't need to report back their status to the Ambari server; > nothing from them will get put into the agent's alert collector. > > - Alert definitions have a new property to ignore the hosts that the alert > instances are originating from. This allows any host to run the alert and > report back to Ambari, but the server will collapse these into a single > current alert; there won't be multiple history items either > > > Diffs > ----- > > ambari-agent/src/main/python/ambari_agent/alerts/base_alert.py d93ec48 > ambari-agent/src/main/python/ambari_agent/alerts/script_alert.py 12d0d2a > ambari-agent/src/test/python/ambari_agent/TestAlerts.py 1f8d0c0 > ambari-agent/src/test/python/ambari_agent/dummy_files/test_script.py > 278c26c > > ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProvider.java > 86e7b7e > > ambari-server/src/main/java/org/apache/ambari/server/events/listeners/AlertReceivedListener.java > bcbe823 > > ambari-server/src/main/java/org/apache/ambari/server/orm/entities/AlertDefinitionEntity.java > 6374342 > > ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinition.java > 961fb66 > > ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinitionFactory.java > cd937ef > > ambari-server/src/main/java/org/apache/ambari/server/upgrade/SchemaUpgradeHelper.java > e1d5dca > > ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog200.java > PRE-CREATION > ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1b16c2f > ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql ef7b564 > ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 18fe6d4 > ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql > fa131fd > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/alerts.json > a409230 > > ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/files/alert_ha_namenode_health.py > PRE-CREATION > > ambari-server/src/test/java/org/apache/ambari/server/api/services/AmbariMetaInfoTest.java > 2b1853a > > ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProviderTest.java > 7823994 > > ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog200Test.java > PRE-CREATION > ambari-server/src/test/resources/stacks/HDP/2.0.5/services/HDFS/alerts.json > 92e7b8f > > Diff: https://reviews.apache.org/r/27582/diff/ > > > Testing > ------- > > Tested on clusters with both HA disabled and enabled. When enabled, verified > that failing different instances of the NameNodes had the correct affect on > the alert: > > "state" : "CRITICAL", > "text" : "Active[], Standby['c6402.ambari.apache.org:50070'], > Unknown['c6401.ambari.apache.org:50070']" > > "state" : "CRITICAL", > "text" : "Active['c6402.ambari.apache.org:50070'], Standby[], > Unknown['c6401.ambari.apache.org:50070']" > > "state" : "OK", > "text" : "Active['c6402.ambari.apache.org:50070'], > Standby['c6401.ambari.apache.org:50070'], Unknown[]" > > New tests added as well... > > > Thanks, > > Jonathan Hurley > >
