[
https://issues.apache.org/jira/browse/AMBARI-24531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16704335#comment-16704335
]
Dmytro Grinenko commented on AMBARI-24531:
------------------------------------------
[~zmarsh13] the more proper solution would be:
{code}
is_topology_healthy = bool(active_namenodes and standby_namenodes and not
unknown_namenodes)
{code}
As we should not really care about amount of healthy node - they should only
present, however the amount of "bad" nodes should be zero. This removes
unnecessary calculation of the array length.
The full patch is attached [^AMBARI-24531.patch]
> Persistent critical "NameNode High Availability Health" alert after
> installing with 3 NameNodes
> -----------------------------------------------------------------------------------------------
>
> Key: AMBARI-24531
> URL: https://issues.apache.org/jira/browse/AMBARI-24531
> Project: Ambari
> Issue Type: Bug
> Components: alerts
> Affects Versions: 2.7.0
> Environment: sles12sp2
> Reporter: Zack Marsh
> Priority: Major
> Attachments: AMBARI-24531.patch
>
>
> After installing Hadoop with 3 NameNodes, there's a persistent alert in the
> Ambari UI for the HDFS service:
> {code:java}
> NameNode High Availability Health:
> Active['hdp2.labs.teradata.com:50070'],
> Standby['hdp1.labs.teradata.com:50070', 'hdp3.labs.teradata.com:50070'],
> Unknown[]
> {code}
> This appears to stem from the alert_ha_namenode_health.py script, in which
> the NameNode topology is deemed unhealthy if there's not exactly 1 Standby
> NameNode.
> Excerpt from the alert_ha_namenode_health.py script:
> {code:java}
> # there's only one scenario here; there is exactly 1 active and 1 standby
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes)
> == 1
> result_label = 'Active{0}, Standby{1},
> Unknown{2}'.format(str(active_namenodes),
> str(standby_namenodes), str(unknown_namenodes))
> if is_topology_healthy:
> # if there is exactly 1 active and 1 standby NN
> return (RESULT_STATE_OK, [result_label])
> else:
> # other scenario
> return (RESULT_STATE_CRITICAL, [result_label]){code}
>
> Currently using the following workaround:
>
> 1. Replacing the following line in {{alert_ha_namenode_health.py}}:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes)
> == 1{code}
> With:
> {code:java}
> is_topology_healthy = len(active_namenodes) == 1 and len(standby_namenodes)
> == len(nn_unique_ids)-1{code}
> 2. Restart Ambari Server
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)