[jira] [Commented] (AMBARI-24531) Persistent critical "NameNode High Availability Health" alert after installing with 3 NameNodes

Dmytro Grinenko (JIRA) Thu, 29 Nov 2018 22:57:34 -0800


    [ 
https://issues.apache.org/jira/browse/AMBARI-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704335#comment-16704335
 ]


Dmytro Grinenko commented on AMBARI-24531:
------------------------------------------

[~zmarsh13] the more proper solution would be: 
{code}
is_topology_healthy = bool(active_namenodes and standby_namenodes and not 
unknown_namenodes)
{code}

As we should not really care about amount of healthy node - they should only 
present, however the amount of "bad"  nodes should be zero. This removes 
unnecessary calculation of the array length.

The full patch is attached [^AMBARI-24531.patch] 

> Persistent critical "NameNode High Availability Health" alert after 
> installing with 3 NameNodes
> -----------------------------------------------------------------------------------------------
>
>                 Key: AMBARI-24531
>                 URL: https://issues.apache.org/jira/browse/AMBARI-24531
>             Project: Ambari
>          Issue Type: Bug
>          Components: alerts
>    Affects Versions: 2.7.0
>         Environment: sles12sp2
>            Reporter: Zack Marsh
>            Priority: Major
>         Attachments: AMBARI-24531.patch
>
>
> After installing Hadoop with 3 NameNodes, there's a persistent alert in the 
> Ambari UI for the HDFS service:
> {code:java}
> NameNode High Availability Health:
> Active['hdp2.labs.teradata.com:50070'], 
> Standby['hdp1.labs.teradata.com:50070', 'hdp3.labs.teradata.com:50070'], 
> Unknown[]
> {code}
> This appears to stem from the alert_ha_namenode_health.py script, in which 
> the NameNode topology is deemed unhealthy if there's not exactly 1 Standby 
> NameNode.
> Excerpt from the alert_ha_namenode_health.py script:
> {code:java}
> # there's only one scenario here; there is exactly 1 active and 1 standby
>   is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) 
> == 1
>   result_label = 'Active{0}, Standby{1}, 
> Unknown{2}'.format(str(active_namenodes),
>     str(standby_namenodes), str(unknown_namenodes))
>   if is_topology_healthy:
>     # if there is exactly 1 active and 1 standby NN
>     return (RESULT_STATE_OK, [result_label])
>   else:
>     # other scenario
>     return (RESULT_STATE_CRITICAL, [result_label]){code}
>  
> Currently using the following workaround:
>  
> 1. Replacing the following line in {{alert_ha_namenode_health.py}}:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) 
> == 1{code}
> With:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes) 
> == len(nn_unique_ids)-1{code}
> 2. Restart Ambari Server
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (AMBARI-24531) Persistent critical "NameNode High Availability Health" alert after installing with 3 NameNodes

Reply via email to