Aravindan Vijayan created AMBARI-18191:
------------------------------------------

             Summary: "Restart all required" services operation failed at 
Metrics Collector since HDFS was not yet up
                 Key: AMBARI-18191
                 URL: https://issues.apache.org/jira/browse/AMBARI-18191
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics
    Affects Versions: 2.4.0
            Reporter: Aravindan Vijayan
            Assignee: Aravindan Vijayan
            Priority: Blocker
             Fix For: 2.4.0


ambari-server --hash
4017036da951a10f519a578de934308cf866ba50

*Steps*
# Deploy HDP-2.3.6 cluster with Ambari 2.2.2.0 (AMS is configured in 
distributed mode)
# Upgrade Ambari to 2.4.0.0 and let it complete
# Open Ambari web UI and hit "Restart all required" under Actions menu

*Result*
The operation fails while trying to restart Metrics Collector as it tried to 
make a WebHDFS call while HDFS was not started:
{code}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py",
 line 148, in <module>
    AmsCollector().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 280, in execute
    method(env)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 725, in restart
    self.start(env)
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py",
 line 46, in start
    self.configure(env, action = 'start') # for security
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_collector.py",
 line 41, in configure
    hbase('master', action)
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
line 89, in thunk
    return fn(*args, **kwargs)
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/hbase.py",
 line 213, in hbase
    dfs_type=params.dfs_type
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 155, in __init__
    self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 160, in run
    self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 124, in run_action
    provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 459, in action_create_on_execute
    self.action_delayed("create")
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 456, in action_delayed
    self.get_hdfs_resource_executor().action_delayed(action_name, self)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 256, in action_delayed
    self._set_mode(self.target_status)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 363, in _set_mode
    self.util.run_command(self.main_resource.resource.target, 'SETPERMISSION', 
method='PUT', permission=self.mode, assertable_result=False)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/providers/hdfs_resource.py",
 line 179, in run_command
    _, out, err = get_user_call_output(cmd, user=self.run_user, 
logoutput=self.logoutput, quiet=False)
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/functions/get_user_call_output.py",
 line 61, in get_user_call_output
    raise Fail(err_msg)
resource_management.core.exceptions.Fail: Execution of 'curl -sS -L -w 
'%{http_code}' -X PUT --negotiate -u : 
'http://vsharma-eu-mt-5.openstacklocal:50070/webhdfs/v1/user/ams/hbase?op=SETPERMISSION&user.name=hdfs&permission=775'
 1>/tmp/tmp8twcZt 2>/tmp/tmpLPih9a' returned 7. curl: (7) couldn't connect to 
host
401
{code}

Afterwards, restarted HDFS individually first and then hit "Restart all 
Required" - the operation was successful
Looks like the issue is because the order of restart is incorrect across the 
hosts, hence the dependent services don't come up upfront



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to