[
https://issues.apache.org/jira/browse/AMBARI-9894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yusaku Sako updated AMBARI-9894:
--------------------------------
Assignee: Jonathan Hurley
> Alerts: YARN YM HA Alerts Are UNKNOWN Due to HA Redirects
> ---------------------------------------------------------
>
> Key: AMBARI-9894
> URL: https://issues.apache.org/jira/browse/AMBARI-9894
> Project: Ambari
> Issue Type: Bug
> Reporter: Jonathan Hurley
> Assignee: Jonathan Hurley
> Priority: Critical
> Fix For: 2.0.0
>
> Attachments: AMBARI-9894.patch
>
>
> 3-node cluster
> Configured ResourceManager HA. Three alerts are now Unknown:
> - ResourceManager RPC Latency. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> - NodeManger Health Summary. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> - ResourceManager CPU Utiliz. Has two instances as expected but each is
> unknown "No JSON object could be decoded".
> Both RMs are running and I can quick llink over to RMUI + JMX.
> The reason this fails is because YARN forwards requests for the standby RM to
> the active one. In this scenario, the alert gets back an HTTP 200 response
> that looks like:
> {noformat}
> This is standby RM. Redirecting to the current active RM:
> http://c6403.ambari.apache.org:8088/
> {noformat}
> Unfortunately, this is a refresh header redirect which is not able to be
> handled by the metric alert. The reason that the alerts work is that after
> the VMs restarted, the original RM became active again.
> There are a few issues here:
> - YARN doesn't do HA in the same way that other services like HDFS do. As a
> result, there's no config property that could let the alert know what to do
> or which hosts to contact.
> - YARN actually forwards after an HTTP 200 to the active node, which doesn't
> jive with how alerts works.
> This is a definite problem and requires some further investigation.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)