[
https://issues.apache.org/jira/browse/AMBARI-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689061#comment-15689061
]
Weiwei Yang commented on AMBARI-18929:
--------------------------------------
Hello [~Tim Thorpe], [~dili]
Thanks for your notice. Steps to reproduce
1. Enable RM HA
2. Stop standby RM (or active)
3. Run Yarn service check
the service check fails with following error
{code}
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
line 159, in <module>
ServiceCheck().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 280, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
line 136, in service_check
path='/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin',
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py",
line 61, in get_user_call_output
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl --negotiate -u :
-ksL --connect-timeout 5
http://eked1.fyre.ibm.com:8088/ws/v1/cluster/apps/application_1479880110367_0003
1>/tmp/tmpTevXud 2>/tmp/tmpw06IwQ' returned 7.
{code}
following out
{code}
...
2016-11-22 22:04:35,429 - call returned (0, '')
2016-11-22 22:04:35,431 - call['ambari-sudo.sh su ambari-qa -l -s /bin/bash -c
'curl --negotiate -u : -ksL --connect-timeout 5
http://eked1.fyre.ibm.com:8088/ws/v1/cluster/apps/application_1479880110367_0003
1>/tmp/tmpTevXud 2>/tmp/tmpw06IwQ''] {'path':
'/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'quiet': False}
2016-11-22 22:04:35,466 - call returned (7, '')
{code}
as long as there is one RM not response, the service check will fail.
> Yarn service check fails when either resource manager is down in HA enabled
> cluster
> -----------------------------------------------------------------------------------
>
> Key: AMBARI-18929
> URL: https://issues.apache.org/jira/browse/AMBARI-18929
> Project: Ambari
> Issue Type: Bug
> Components: ambari-server
> Affects Versions: 2.4.0
> Reporter: Weiwei Yang
>
> When HA is enabled, yarn service_check.py fails if one of RM is down, even
> the other one is active. This gives user the wrong impression the yarn
> cluster is not healthy. Instead, service check should pass, or at least pass
> with warning that lets user know there is one RM down.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)