[
https://issues.apache.org/jira/browse/AMBARI-15624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15224830#comment-15224830
]
Kyle R Dunn commented on AMBARI-15624:
--------------------------------------
It turns out this was an artifact of a network issue. The cluster hosts were
configured with MTU 9000 (jumboframes) but the switch was misconfigured,
resulting in HTTP timeouts which ultimately manifested in this alert.
The test to identify this network issue was to ping between Ambari Server and
the namenode hosts with -s 8192. A failure may indicate the same network
infrastructure misconfiguration.
> HA Namenode Health Alert showing invalid state
> ----------------------------------------------
>
> Key: AMBARI-15624
> URL: https://issues.apache.org/jira/browse/AMBARI-15624
> Project: Ambari
> Issue Type: Bug
> Components: alerts
> Affects Versions: 2.2.1
> Reporter: Kyle R Dunn
>
> I recently did a deployment via Blueprint with an HA Namenode configuration.
> Several parameters did not receive host group substitution requiring the
> Namenode and Standby namenodes had to be manually formatted/bootstrapped
> respectively. Now Ambari alerts shows the stanby namenode in "unknown" state.
> If I failover from nn2 to nn1 one Namenode shows in Standby and the other in
> unknown. I've restarted Ambari server, HDFS, disabled and reenabled the alert
> and yet it persists. hdfs haadmin -getServiceState for nn1 and nn2 show one
> active and one standby namenode.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)