[ 
https://issues.apache.org/jira/browse/AMBARI-18929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15689061#comment-15689061
 ] 

Weiwei Yang commented on AMBARI-18929:
--------------------------------------

Hello [~Tim Thorpe], [~dili]

Thanks for your notice. Steps to reproduce

1. Enable RM HA
2. Stop standby RM (or active)
3. Run Yarn service check

the service check fails with following error

{code}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 159, in <module>
    ServiceCheck().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 280, in execute
    method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/service_check.py",
 line 136, in service_check
    path='/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin',
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py",
 line 61, in get_user_call_output
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl --negotiate -u : 
-ksL --connect-timeout 5 
http://eked1.fyre.ibm.com:8088/ws/v1/cluster/apps/application_1479880110367_0003
 1>/tmp/tmpTevXud 2>/tmp/tmpw06IwQ' returned 7.
{code}

following out

{code}
...
2016-11-22 22:04:35,429 - call returned (0, '')
2016-11-22 22:04:35,431 - call['ambari-sudo.sh su ambari-qa -l -s /bin/bash -c 
'curl --negotiate -u : -ksL --connect-timeout 5 
http://eked1.fyre.ibm.com:8088/ws/v1/cluster/apps/application_1479880110367_0003
 1>/tmp/tmpTevXud 2>/tmp/tmpw06IwQ''] {'path': 
'/usr/sbin:/sbin:/usr/local/bin:/bin:/usr/bin', 'quiet': False}
2016-11-22 22:04:35,466 - call returned (7, '')
{code}

as long as there is one RM not response, the service check will fail.

> Yarn service check fails when either resource manager is down in HA enabled 
> cluster
> -----------------------------------------------------------------------------------
>
>                 Key: AMBARI-18929
>                 URL: https://issues.apache.org/jira/browse/AMBARI-18929
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-server
>    Affects Versions: 2.4.0
>            Reporter: Weiwei Yang
>
> When HA is enabled, yarn service_check.py fails if one of RM is down, even 
> the other one is active. This gives user the wrong impression the yarn 
> cluster is not healthy. Instead, service check should pass, or at least pass 
> with warning that lets user know there is one RM down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to