-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44397/
-----------------------------------------------------------

Review request for Ambari, Alejandro Fernandez, Dmitro Lisnichenko, Jayush 
Luniya, and Sumit Mohanty.


Bugs: AMBARI-15303
    https://issues.apache.org/jira/browse/AMBARI-15303


Repository: ambari


Description
-------

Alerts "suppress" maintenance mode by indicating a {{maintenance_state}} 
attribute in addition to the actual state which is being reported:

{code}
      "Alert": {
        "cluster_name": "c1",
        "component_name": "METRICS_COLLECTOR",
        "definition_id": 43,
        "definition_name": "ams_metrics_collector_process",
        "host_name": "c6401.ambari.apache.org",
        "id": 28,
        "instance": null,
        "label": "Metrics Collector Process",
        "latest_timestamp": 1457108946118,
        "maintenance_state": "ON",
        "original_timestamp": 1457108646099,
        "scope": "ANY",
        "service_name": "AMBARI_METRICS",
        "state": "CRITICAL",
        "text": "Connection failed: [Errno 111] Connection refused to 
c6401.ambari.apache.org"
      }
{code}

When a host/service/component is placed into MM, the database is updated so 
that all {{alert_current}} rows which are affected have their MM updated as 
well.

However, this fails under two scenarios:
- The alert hasn't been received yet in a brand new cluster
- The alert definition was disabled, which removed all current alerts. Then, it 
was re-enabled.

In both cases, when constructing a new {{AlertCurrentEntity}}, we need to 
calculate the correct maintenance state.


Diffs
-----

  
ambari-server/src/main/java/org/apache/ambari/server/controller/MaintenanceStateHelper.java
 cd49e76 
  
ambari-server/src/main/java/org/apache/ambari/server/events/listeners/alerts/AlertReceivedListener.java
 9bbfe37 

Diff: https://reviews.apache.org/r/44397/diff/


Testing
-------

PENDING: Writing UTs and running tests now... 

Verified fix in an existing cluster by disabling alerts, then re-enabling them 
on a MM component with an active alert.


Thanks,

Jonathan Hurley

Reply via email to