[ https://issues.apache.org/jira/browse/AMBARI-17182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Victor Galgo updated AMBARI-17182: ---------------------------------- Attachment: nnha_fix.patch I have run 'mvn clean test' for ambari-web. All tests pass: {code} Calling set on destroyed view Calling set on destroyed view Calling set on destroyed view Calling set on destroyed view 28668 tests complete (34 seconds) 154 tests pending [INFO] [INFO] --- apache-rat-plugin:0.11:check (default) @ ambari-web --- [INFO] 51 implicit excludes (use -debug for more details). [INFO] Exclude: .idea/** [INFO] Exclude: package.json [INFO] Exclude: public/** [INFO] Exclude: public-static/** [INFO] Exclude: app/assets/** [INFO] Exclude: vendor/** [INFO] Exclude: node_modules/** [INFO] Exclude: node/** [INFO] Exclude: npm-debug.log [INFO] 1425 resources included (use -debug for more details) Warning: org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. Compiler warnings: WARNING: 'org.apache.xerces.jaxp.SAXParserImpl: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized.' Warning: org.apache.xerces.parsers.SAXParser: Feature 'http://javax.xml.XMLConstants/feature/secure-processing' is not recognized. Warning: org.apache.xerces.parsers.SAXParser: Property 'http://javax.xml.XMLConstants/property/accessExternalDTD' is not recognized. Warning: org.apache.xerces.parsers.SAXParser: Property 'http://www.oracle.com/xml/jaxp/properties/entityExpansionLimit' is not recognized. [INFO] Rat check: Summary of files. Unapproved: 0 unknown: 0 generated: 0 approved: 1425 licence. [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 1:31.015s [INFO] Finished at: Sun Jun 12 14:37:47 EEST 2016 [INFO] Final Memory: 13M/407M [INFO] ------------------------------------------------------------------------ {code} [~sumitmohanty] can you please help to commit this? > App timeline Server start fails on enabling HA because namenode is in safemode > ------------------------------------------------------------------------------ > > Key: AMBARI-17182 > URL: https://issues.apache.org/jira/browse/AMBARI-17182 > Project: Ambari > Issue Type: Bug > Affects Versions: 2.4.0 > Reporter: Victor Galgo > Priority: Critical > Labels: ha, namenode > Fix For: 2.4.0 > > Attachments: nnha_fix.patch > > > On the last step "Start all" on enabling HA below happens: > {code} > Traceback (most recent call last): > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", > line 147, in <module> > ApplicationTimelineServer().execute() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py", > line 219, in execute > method(env) > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", > line 43, in start > self.configure(env) # FOR SECURITY > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/application_timeline_server.py", > line 54, in configure > yarn(name='apptimelineserver') > File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", > line 89, in thunk > return fn(*args, **kwargs) > File > "/var/lib/ambari-agent/cache/common-services/YARN/2.1.0.2.0/package/scripts/yarn.py", > line 276, in yarn > mode=0755 > File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", > line 154, in __init__ > self.env.run() > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 160, in run > self.run_action(resource, action) > File > "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", > line 124, in run_action > provider_action() > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 463, in action_create_on_execute > self.action_delayed("create") > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 460, in action_delayed > self.get_hdfs_resource_executor().action_delayed(action_name, self) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 259, in action_delayed > self._set_mode(self.target_status) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 366, in _set_mode > self.util.run_command(self.main_resource.resource.target, > 'SETPERMISSION', method='PUT', permission=self.mode, assertable_result=False) > File > "/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py", > line 195, in run_command > raise Fail(err_msg) > resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w > '%{http_code}' -X PUT > 'http://os-s11-3-iavzl-nat-s-ru242to25susesecha-12.openstacklocal:50070/webhdfs/v1/ats/done?op=SETPERMISSION&user.name=hdfs&permission=755'' > returned status_code=403. > { > "RemoteException": { > "exception": "RetriableException", > "javaClassName": "org.apache.hadoop.ipc.RetriableException", > "message": "org.apache.hadoop.hdfs.server.namenode.SafeModeException: > Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported > blocks 675 needs additional 16 blocks to reach the threshold 0.9900 of total > blocks 697.\nThe number of live datanodes 20 has reached the minimum number > 0. Safe mode will be turned off automatically once the thresholds have been > reached." > } > } > {code} > This happens because NN is not yet out of safemode at the moment of ats > start, because DNs just started. > To fix this "stop namenodes" has to be triggered before "start all". > If this is done, on "Start all" it will be ensured that datanodes start prior > to NN, and that NN are out of safemode before ATS start. -- This message was sent by Atlassian JIRA (v6.3.4#6332)