[ 
https://issues.apache.org/jira/browse/AMBARI-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15326436#comment-15326436
 ] 

Hadoop QA commented on AMBARI-17182:
------------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12809698/nnha_fix.patch
  against trunk revision .

    {color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

    {color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
                        Please justify why no new tests are needed for this 
patch.
                        Also please list what manual steps were performed to 
verify this patch.

    {color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

    {color:green}+1 core tests{color}.  The patch passed unit tests in 
ambari-web.

Test results: 
https://builds.apache.org/job/Ambari-trunk-test-patch/7308//testReport/
Console output: 
https://builds.apache.org/job/Ambari-trunk-test-patch/7308//console

This message is automatically generated.

> App timeline Server start fails on enabling HA because namenode is in safemode
> ------------------------------------------------------------------------------
>
>                 Key: AMBARI-17182
>                 URL: https://issues.apache.org/jira/browse/AMBARI-17182
>             Project: Ambari
>          Issue Type: Bug
>    Affects Versions: 2.4.0
>            Reporter: Victor Galgo
>            Priority: Critical
>              Labels: ha, namenode
>             Fix For: 2.4.0
>
>         Attachments: nnha_fix.patch
>
>
> On the last step "Start all" on enabling HA below happens:
> {code}
> Traceback (most recent call last):
>   File 
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py",
>  line 147, in <module>
>     ApplicationTimelineServer().execute()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
>  line 219, in execute
>     method(env)
>   File 
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py",
>  line 43, in start
>     self.configure(env) # FOR SECURITY
>   File 
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py",
>  line 54, in configure
>     yarn(name='apptimelineserver')
>   File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
> line 89, in thunk
>     return fn(*args, **kwargs)
>   File 
> "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py",
>  line 276, in yarn
>     mode=0755
>   File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
> line 154, in __init__
>     self.env.run()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 160, in run
>     self.run_action(resource, action)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
> line 124, in run_action
>     provider_action()
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
>  line 463, in action_create_on_execute
>     self.action_delayed("create")
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
>  line 460, in action_delayed
>     self.get_hdfs_resource_executor().action_delayed(action_name, self)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
>  line 259, in action_delayed
>     self._set_mode(self.target_status)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
>  line 366, in _set_mode
>     self.util.run_command(self.main_resource.resource.target, 
> 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False)
>   File 
> "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
>  line 195, in run_command
>     raise Fail(err_msg)
> resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w 
> '%{http_code}' -X PUT 
> 'http://os-s11-3-iavzl-nat-s-ru242to25susesecha-12.openstacklocal:50070/webhdfs/v1/ats/done?op=SETPERMISSION&user.name=hdfs&permission=755''
>  returned status_code=403. 
> {
>   "RemoteException": {
>     "exception": "RetriableException", 
>     "javaClassName": "org.apache.hadoop.ipc.RetriableException", 
>     "message": "org.apache.hadoop.hdfs.server.namenode.SafeModeException: 
> Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported 
> blocks 675 needs additional 16 blocks to reach the threshold 0.9900 of total 
> blocks 697.\nThe number of live datanodes 20 has reached the minimum number 
> 0. Safe mode will be turned off automatically once the thresholds have been 
> reached."
>   }
> }
> {code}
> This happens because NN is not yet out of safemode at the moment of ats 
> start, because DNs just started.
> To fix this "stop namenodes" has to be triggered before "start all".
> If this is done, on "Start all" it will be ensured that datanodes start prior 
> to NN, and that NN are out of safemode before ATS start.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to