-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27582/
-----------------------------------------------------------

(Updated Nov. 4, 2014, 2:19 p.m.)


Review request for Ambari, Nate Cole and Tom Beerbower.


Changes
-------

Didn't have the alerts.json changes staged for the patch.


Bugs: AMBARI-8143
    https://issues.apache.org/jira/browse/AMBARI-8143


Repository: ambari


Description
-------

The NameNode HA Health Check is unique in that is requires knowledge of both 
states of the active and passive NN in order to make the correct alert state 
calculation. It also doesn't need to run on every host that's a NAMENODE 
component. 

This presents a problem for alerts as there is no way to say, "Run this alert, 
but only on one host, not both". It's also a problem because if the host you 
want to run it on goes down, the alert won't run. And finally, it's a problem 
because if you run the alert on host1 and then that host fails and host2 takes 
over, the alert appears to be from another host and does not replace the 
original alert.

To solve this, the following changes were made:

- SCRIPT alerts can now return a status of SKIPPED, meaning that they ran 
successfully but don't need to report back their status to the Ambari server; 
nothing from them will get put into the agent's alert collector.

- Alert definitions have a new property to ignore the hosts that the alert 
instances are originating from. This allows any host to run the alert and 
report back to Ambari, but the server will collapse these into a single current 
alert; there won't be multiple history items either


Diffs (updated)
-----

  ambari-agent/src/main/python/ambari_agent/alerts/base_alert.py d93ec48 
  ambari-agent/src/main/python/ambari_agent/alerts/script_alert.py 12d0d2a 
  ambari-agent/src/test/python/ambari_agent/TestAlerts.py 1f8d0c0 
  ambari-agent/src/test/python/ambari_agent/dummy_files/test_script.py 278c26c 
  
ambari-server/src/main/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProvider.java
 86e7b7e 
  
ambari-server/src/main/java/org/apache/ambari/server/events/listeners/AlertReceivedListener.java
 bcbe823 
  
ambari-server/src/main/java/org/apache/ambari/server/orm/entities/AlertDefinitionEntity.java
 6374342 
  
ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinition.java
 961fb66 
  
ambari-server/src/main/java/org/apache/ambari/server/state/alert/AlertDefinitionFactory.java
 cd937ef 
  
ambari-server/src/main/java/org/apache/ambari/server/upgrade/SchemaUpgradeHelper.java
 e1d5dca 
  
ambari-server/src/main/java/org/apache/ambari/server/upgrade/UpgradeCatalog200.java
 PRE-CREATION 
  ambari-server/src/main/resources/Ambari-DDL-MySQL-CREATE.sql 1b16c2f 
  ambari-server/src/main/resources/Ambari-DDL-Oracle-CREATE.sql ef7b564 
  ambari-server/src/main/resources/Ambari-DDL-Postgres-CREATE.sql 18fe6d4 
  ambari-server/src/main/resources/Ambari-DDL-Postgres-EMBEDDED-CREATE.sql 
fa131fd 
  ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/alerts.json 
a409230 
  
ambari-server/src/main/resources/stacks/HDP/2.0.6/services/HDFS/package/files/alert_ha_namenode_health.py
 PRE-CREATION 
  
ambari-server/src/test/java/org/apache/ambari/server/api/services/AmbariMetaInfoTest.java
 2b1853a 
  
ambari-server/src/test/java/org/apache/ambari/server/controller/internal/AlertDefinitionResourceProviderTest.java
 7823994 
  
ambari-server/src/test/java/org/apache/ambari/server/upgrade/UpgradeCatalog200Test.java
 PRE-CREATION 
  ambari-server/src/test/resources/stacks/HDP/2.0.5/services/HDFS/alerts.json 
92e7b8f 

Diff: https://reviews.apache.org/r/27582/diff/


Testing
-------

Tested on clusters with both HA disabled and enabled. When enabled, verified 
that failing different instances of the NameNodes had the correct affect on the 
alert:

        "state" : "CRITICAL",
        "text" : "Active[], Standby['c6402.ambari.apache.org:50070'], 
Unknown['c6401.ambari.apache.org:50070']"
        
        "state" : "CRITICAL",
        "text" : "Active['c6402.ambari.apache.org:50070'], Standby[], 
Unknown['c6401.ambari.apache.org:50070']"
        
        "state" : "OK",
        "text" : "Active['c6402.ambari.apache.org:50070'], 
Standby['c6401.ambari.apache.org:50070'], Unknown[]"
        
New tests added as well...


Thanks,

Jonathan Hurley

Reply via email to