Alejandro Fernandez created AMBARI-12013:
--------------------------------------------
Summary: Datanode failed to restart during RU because the
shutdownDatanode -upgrade command can fail sometimes
Key: AMBARI-12013
URL: https://issues.apache.org/jira/browse/AMBARI-12013
Project: Ambari
Issue Type: Bug
Components: ari-server, ambari-server
Affects Versions: 2.1.0
Reporter: Alejandro Fernandez
Assignee: Alejandro Fernandez
Priority: Critical
Fix For: 2.1.0
Deploy Test with RU from HDP 2.2.0.0-2041 to HDP-2.3.0.0-2398
Failed on: Restarting DataNode on ip-172-31-44-83.ec2.internalshow details
{code}
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
line 151, in <module>
DataNode().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 216, in execute
method(env)
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 437, in restart
self.stop(env, rolling_restart=rolling_restart)
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode.py",
line 55, in stop
datanode_upgrade.pre_upgrade_shutdown()
File
"/var/lib/ambari-agent/cache/common-services/HDFS/2.1.0.2.0/package/scripts/datanode_upgrade.py",
line 43, in pre_upgrade_shutdown
Execute(command, user=params.hdfs_user, tries=1 )
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 157, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 152, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 118, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 254, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 70, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 92, in checked_call
tries=tries, try_sleep=try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 140, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 291, in _call
raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'hdfs dfsadmin
-shutdownDatanode 0.0.0.0:8010 upgrade' returned 255. shutdownDatanode:
Shutdown already in progress.
{code}
There's a known issue in HDP 2.2.0.0 (HDFS-7533) where shutting down the
datanode will not work because not all writers have responder running, but
sendOOB() tries anyway.
If the shutdown command fails with an output of "Shutdown already in progress",
then Ambari should call datanode(action="stop"), which under the hood calls
"hadoop-daemon.sh stop datanode"
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)