Aravindan Vijayan created AMBARI-19204:
------------------------------------------

             Summary: Metrics monitor start failed after deleting AMS and 
reinstalling with different user
                 Key: AMBARI-19204
                 URL: https://issues.apache.org/jira/browse/AMBARI-19204
             Project: Ambari
          Issue Type: Bug
          Components: ambari-metrics
    Affects Versions: 2.5.0
            Reporter: Aravindan Vijayan
            Assignee: Aravindan Vijayan
             Fix For: 2.5.0


STR: 
1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari 
Infra, Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
2) Add all the deleted services back

Metrics collector fails to start with 
{noformat}
Traceback (most recent call last):
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
 line 68, in <module>
    AmsMonitor().execute()
  File 
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
 line 282, in execute
    method(env)
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
 line 42, in start
    action = 'start'
  File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py", 
line 89, in thunk
    return fn(*args, **kwargs)
  File 
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py",
 line 103, in ams_service
    user=params.ams_user
  File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", 
line 155, in __init__
    self.env.run()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 160, in run
    self.run_action(resource, action)
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py", 
line 124, in run_action
    provider_action()
  File 
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
 line 262, in action_run
    tries=self.resource.tries, try_sleep=self.resource.try_sleep)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 72, in inner
    result = function(command, **kwargs)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 102, in checked_call
    tries=tries, try_sleep=try_sleep, 
timeout_kill_strategy=timeout_kill_strategy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 150, in _call_wrapper
    result = _call(command, **kwargs_copy)
  File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", 
line 303, in _call
    raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of 
'/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf 
start' returned 255. ######## Hortonworks #############
This is MOTD message, added for testing in qe infra
psutil build directory is not empty, continuing...
Verifying Python version compatibility...
Using python  /usr/bin/python2.6
Checking for previously running Metric Monitor...
Starting ambari-metrics-monitor
/usr/sbin/ambari-metrics-monitor: line 148: 
/grid/0/log/metric_monitor/ambari-metrics-monitor.out: Permission denied
Verifying ambari-metrics-monitor process status...
ERROR: ambari-metrics-monitor start failed. For more details, see 
/grid/0/log/metric_monitor/ambari-metrics-monitor.out:
====================
2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
====================
Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out
stdout:   /var/lib/ambari-agent/data/output-1028.txt

2016-12-14 06:12:10,119 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:10,432 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {}
2016-12-14 06:12:10,434 - Group['hadoop'] {}
2016-12-14 06:12:10,435 - Group['users'] {}
2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop', 
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] 
{'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh 
ambari-qa 
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
 {'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-12-14 06:12:10,626 - Skipping 
Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa 
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
 due to not_if
2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase', 
'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh'] 
{'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh 
cstm-hbase 
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
 {'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'}
2016-12-14 06:12:10,983 - Skipping 
Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase 
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
 due to not_if
2016-12-14 06:12:10,984 - Group['hdfs'] {}
2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True, 
'groups': ['hadoop', 'hdfs']}
2016-12-14 06:12:10,985 - FS Type: 
2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755}
2016-12-14 06:12:11,068 - 
File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content': 
InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}
2016-12-14 06:12:11,192 - 
Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs', 
'group': 'hadoop', 'mode': 01777}
2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which 
getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo': 
True, 'only_if': 'test -f /selinux/enforce'}
2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if
2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root', 
'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root', 
'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root
2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root
2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs', 
'create_parents': True, 'cd_access': 'a'}
2016-12-14 06:12:12,019 - 
File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties'] 
{'content': Template('commons-logging.properties.j2'), 'owner': 'root'}
2016-12-14 06:12:12,143 - 
File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content': 
Template('health_check.j2'), 'owner': 'root'}
2016-12-14 06:12:12,248 - 
File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ..., 
'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-12-14 06:12:12,380 - 
File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties'] 
{'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-12-14 06:12:12,482 - 
File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content': 
StaticFile('task-log4j.properties'), 'mode': 0755}
2016-12-14 06:12:12,597 - 
File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs', 
'group': 'hadoop'}
2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data'] 
{'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if': 
'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py'] 
{'content': StaticFile('topology_script.py'), 'only_if': 'test -d 
/etc/hadoop/conf', 'mode': 0755}
2016-12-14 06:12:13,461 - Using hadoop conf dir: 
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:13,466 - checked_call['hostid'] {}
2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12')
2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf'] 
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True}
2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner': 
'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755}
2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner': 
'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755, 
'cd_access': 'a'}
2016-12-14 06:12:13,971 - 
Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build'] 
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access': 
'a'}
2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop 
/usr/lib/python2.6/site-packages/resource_monitoring'] {}
2016-12-14 06:12:14,411 - 
TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'owner': 
'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
2016-12-14 06:12:14,421 - 
File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'content': 
Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 
'mode': None}
2016-12-14 06:12:14,549 - 
TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'owner': 
'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
2016-12-14 06:12:14,551 - 
File['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'content': 
Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop', 
'mode': None}
2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh'] 
{'content': InlineTemplate(...), 'owner': 'cstm-ams'}
2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config 
/etc/ambari-metrics-monitor/conf start'] {'user': 'cstm-ams'}
2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1 
-type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;'] 
{'logoutput': True, 'ignore_failures': True, 'user': 'cstm-ams'}
######## Hortonworks #############
This is MOTD message, added for testing in qe infra
==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <==
2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage 
for a mountpoint : [Errno 13] Permission denied: 
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard 
based on hostname : ctr-e83-1481604818073-0640-01-000006.hwx.site
{noformat}

NOTE: During cluster initial installation, AMS was installed as user ams, but 
while re-adding AMS, it was added as custom user (cstm-ams)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to