-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/44397/
-----------------------------------------------------------
(Updated March 4, 2016, 2:25 p.m.)
Review request for Ambari, Alejandro Fernandez, Dmitro Lisnichenko, Jayush
Luniya, and Sumit Mohanty.
Bugs: AMBARI-15303
https://issues.apache.org/jira/browse/AMBARI-15303
Repository: ambari
Description
-------
Alerts "suppress" maintenance mode by indicating a {{maintenance_state}}
attribute in addition to the actual state which is being reported:
{code}
"Alert": {
"cluster_name": "c1",
"component_name": "METRICS_COLLECTOR",
"definition_id": 43,
"definition_name": "ams_metrics_collector_process",
"host_name": "c6401.ambari.apache.org",
"id": 28,
"instance": null,
"label": "Metrics Collector Process",
"latest_timestamp": 1457108946118,
"maintenance_state": "ON",
"original_timestamp": 1457108646099,
"scope": "ANY",
"service_name": "AMBARI_METRICS",
"state": "CRITICAL",
"text": "Connection failed: [Errno 111] Connection refused to
c6401.ambari.apache.org"
}
{code}
When a host/service/component is placed into MM, the database is updated so
that all {{alert_current}} rows which are affected have their MM updated as
well.
However, this fails under two scenarios:
- The alert hasn't been received yet in a brand new cluster
- The alert definition was disabled, which removed all current alerts. Then, it
was re-enabled.
In both cases, when constructing a new {{AlertCurrentEntity}}, we need to
calculate the correct maintenance state.
Diffs
-----
ambari-server/src/main/java/org/apache/ambari/server/controller/MaintenanceStateHelper.java
cd49e76
ambari-server/src/main/java/org/apache/ambari/server/events/listeners/alerts/AlertReceivedListener.java
9bbfe37
ambari-server/src/test/java/org/apache/ambari/server/controller/MaintenanceStateHelperTest.java
d9c5039
ambari-server/src/test/java/org/apache/ambari/server/state/alerts/AlertReceivedListenerTest.java
6e58876
Diff: https://reviews.apache.org/r/44397/diff/
Testing (updated)
-------
Tests run: 3918, Failures: 0, Errors: 0, Skipped: 33
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 30:06 min
[INFO] Finished at: 2016-03-04T13:32:30-05:00
[INFO] Final Memory: 50M/607M
[INFO] ------------------------------------------------------------------------
Verified fix in an existing cluster by disabling alerts, then re-enabling them
on a MM component with an active alert.
Thanks,
Jonathan Hurley