[
https://issues.apache.org/jira/browse/AMBARI-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14347028#comment-14347028
]
Hudson commented on AMBARI-9894:
--------------------------------
SUCCESS: Integrated in Ambari-trunk-Commit #1944 (See
[https://builds.apache.org/job/Ambari-trunk-Commit/1944/])
AMBARI-9894 - Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
(jonathanhurley) (jhurley:
http://git-wip-us.apache.org/repos/asf?p=ambari.git&a=commit&h=fdab48d6ba2efe23fd5729448b00752a61ae0059)
* ambari-agent/src/main/python/ambari_agent/alerts/metric_alert.py
* ambari-agent/src/main/python/ambari_agent/alerts/base_alert.py
* ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/alerts.json
* ambari-agent/src/test/python/ambari_agent/TestAlerts.py
* ambari-common/src/main/python/ambari_commons/urllib_handlers.py
* ambari-server/src/main/resources/stacks/BIGTOP/0.8/services/YARN/alerts.json
*
ambari-server/src/main/resources/common-services/YARN/2.1.0.2.0/package/alerts/alert_nodemanagers_summary.py
> Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
> ---------------------------------------------------------
>
> Key: AMBARI-9894
> URL: https://issues.apache.org/jira/browse/AMBARI-9894
> Project: Ambari
> Issue Type: Bug
> Reporter: Jonathan Hurley
> Priority: Critical
> Attachments: AMBARI-9894.patch
>
>
> 3-node cluster
> Configured ResourceManager HA. Three alerts are now Unknown:
> - ResourceManager RPC Latency. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> - NodeManger Health Summary. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> - ResourceManager CPU Utiliz. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> Both RMs are running and I can quick llink over to RMUI + JMX.
> The reason this fails is because YARN forwards requests for the standby RM to
> the active one. In this scenario, the alert gets back an HTTP 200 response
> that looks like:
> {noformat}
> This is standby RM. Redirecting to the current active RM:
> http://c6403.ambari.apache.org:8088/
> {noformat}
> Unfortunately, this is a refresh header redirect which is not able to be
> handled by the metric alert. The reason that the alerts work is that after
> the VMs restarted, the original RM became active again.
> There are a few issues here:
> - YARN doesn't do HA in the same way that other services like HDFS do. As a
> result, there's no config property that could let the alert know what to do
> or which hosts to contact.
> - YARN actually forwards after an HTTP 200 to the active node, which doesn't
> jive with how alerts works.
> This is a definite problem and requires some further investigation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)