Zack Marsh created AMBARI-24531:
-----------------------------------

             Summary: Persistent critical "NameNode High Availability Health" 
alert after installing with 3 NameNodes
                 Key: AMBARI-24531
                 URL: https://issues.apache.org/jira/browse/AMBARI-24531
             Project: Ambari
          Issue Type: Bug
          Components: alerts
    Affects Versions: 2.7.0
         Environment: sles12sp2
            Reporter: Zack Marsh


After installing Hadoop with 3 NameNodes, there's a persistent alert in the 
Ambari UI for the HDFS service:
{code:java}
NameNode High Availability Health:
Active['hdp2.labs.teradata.com:50070'], Standby['hdp1.labs.teradata.com:50070', 
'hdp3.labs.teradata.com:50070'], Unknown[]
{code}
This appears to stem from the alert_ha_namenode_health.py script, in which the 
NameNode topology is deemed unhealthy if there's not exactly 1 Standby NameNode.

Excerpt from the alert_ha_namenode_health.py script:
{code:java}
# there's only one scenario here; there is exactly 1 active and 1 standby
  is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) 
== 1

  result_label = 'Active{0}, Standby{1}, 
Unknown{2}'.format(str(active_namenodes),
    str(standby_namenodes), str(unknown_namenodes))

  if is_topology_healthy:
    # if there is exactly 1 active and 1 standby NN
    return (RESULT_STATE_OK, [result_label])
  else:
    # other scenario
    return (RESULT_STATE_CRITICAL, [result_label]){code}
 

Currently using the following workaround:

 

1. Replacing the following line in {{alert_ha_namenode_health.py}}:
{code:java}
is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 
1{code}
With:
{code:java}
is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) == 
len(nn_unique_ids)-1{code}
2. Restart Ambari Server

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to