Aravindan Vijayan created AMBARI-19204:
------------------------------------------
Summary: Metrics monitor start failed after deleting AMS and
reinstalling with different user
Key: AMBARI-19204
URL: https://issues.apache.org/jira/browse/AMBARI-19204
Project: Ambari
Issue Type: Bug
Components: ambari-metrics
Affects Versions: 2.5.0
Reporter: Aravindan Vijayan
Assignee: Aravindan Vijayan
Fix For: 2.5.0
STR:
1) Delete Service AMS along with Tez,HBase, Sqoop, Oozie, Falcon, Storm, Ambari
Infra, Ambari Metrics, Kafka, Knox, Log Search, Smartsense, Mahout, Slider
2) Add all the deleted services back
Metrics collector fails to start with
{noformat}
Traceback (most recent call last):
File
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 68, in <module>
AmsMonitor().execute()
File
"/usr/lib/python2.6/site-packages/resource_management/libraries/script/script.py",
line 282, in execute
method(env)
File
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/metrics_monitor.py",
line 42, in start
action = 'start'
File "/usr/lib/python2.6/site-packages/ambari_commons/os_family_impl.py",
line 89, in thunk
return fn(*args, **kwargs)
File
"/var/lib/ambari-agent/cache/common-services/AMBARI_METRICS/0.1.0/package/scripts/ams_service.py",
line 103, in ams_service
user=params.ams_user
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py",
line 155, in __init__
self.env.run()
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 160, in run
self.run_action(resource, action)
File
"/usr/lib/python2.6/site-packages/resource_management/core/environment.py",
line 124, in run_action
provider_action()
File
"/usr/lib/python2.6/site-packages/resource_management/core/providers/system.py",
line 262, in action_run
tries=self.resource.tries, try_sleep=self.resource.try_sleep)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 102, in checked_call
tries=tries, try_sleep=try_sleep,
timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py",
line 303, in _call
raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of
'/usr/sbin/ambari-metrics-monitor --config /etc/ambari-metrics-monitor/conf
start' returned 255. ######## Hortonworks #############
This is MOTD message, added for testing in qe infra
psutil build directory is not empty, continuing...
Verifying Python version compatibility...
Using python /usr/bin/python2.6
Checking for previously running Metric Monitor...
Starting ambari-metrics-monitor
/usr/sbin/ambari-metrics-monitor: line 148:
/grid/0/log/metric_monitor/ambari-metrics-monitor.out: Permission denied
Verifying ambari-metrics-monitor process status...
ERROR: ambari-metrics-monitor start failed. For more details, see
/grid/0/log/metric_monitor/ambari-metrics-monitor.out:
====================
2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:37:41,956 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:37:51,956 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:01,957 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:11,958 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640'
2016-12-14 05:38:21,959 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
====================
Monitor out at: /grid/0/log/metric_monitor/ambari-metrics-monitor.out
stdout: /var/lib/ambari-agent/data/output-1028.txt
2016-12-14 06:12:10,119 - Using hadoop conf dir:
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:10,432 - Using hadoop conf dir:
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:10,433 - Group['cstm-knox-group'] {}
2016-12-14 06:12:10,434 - Group['hadoop'] {}
2016-12-14 06:12:10,435 - Group['users'] {}
2016-12-14 06:12:10,435 - User['zookeeper'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,436 - User['infra-solr'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,437 - User['cstm-sqoop'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,438 - User['cstm-ams'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,439 - User['cstm-tez'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,441 - User['cstm-storm'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,442 - User['cstm-knox'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,443 - User['cstm-flume'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,444 - User['cstm-mahout'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,444 - User['cstm-hbase'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,445 - User['logsearch'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,446 - User['cstm-falcon'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,447 - User['ambari-qa'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,448 - User['kafka'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,449 - User['hdfs'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,450 - User['cstm-oozie'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['users']}
2016-12-14 06:12:10,451 - User['yarn'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,452 - User['mapred'] {'gid': 'hadoop',
'fetch_nonlocal_groups': True, 'groups': ['hadoop']}
2016-12-14 06:12:10,453 - File['/var/lib/ambari-agent/tmp/changeUid.sh']
{'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-12-14 06:12:10,612 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh
ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
{'not_if': '(test $(id -u ambari-qa) -gt 1000) || (false)'}
2016-12-14 06:12:10,626 - Skipping
Execute['/var/lib/ambari-agent/tmp/changeUid.sh ambari-qa
/tmp/hadoop-ambari-qa,/tmp/hsperfdata_ambari-qa,/home/ambari-qa,/tmp/ambari-qa,/tmp/sqoop-ambari-qa']
due to not_if
2016-12-14 06:12:10,627 - Directory['/tmp/hbase-hbase'] {'owner': 'cstm-hbase',
'create_parents': True, 'mode': 0775, 'cd_access': 'a'}
2016-12-14 06:12:10,826 - File['/var/lib/ambari-agent/tmp/changeUid.sh']
{'content': StaticFile('changeToSecureUid.sh'), 'mode': 0555}
2016-12-14 06:12:10,963 - Execute['/var/lib/ambari-agent/tmp/changeUid.sh
cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
{'not_if': '(test $(id -u cstm-hbase) -gt 1000) || (false)'}
2016-12-14 06:12:10,983 - Skipping
Execute['/var/lib/ambari-agent/tmp/changeUid.sh cstm-hbase
/home/cstm-hbase,/tmp/cstm-hbase,/usr/bin/cstm-hbase,/var/log/cstm-hbase,/tmp/hbase-hbase']
due to not_if
2016-12-14 06:12:10,984 - Group['hdfs'] {}
2016-12-14 06:12:10,984 - User['hdfs'] {'fetch_nonlocal_groups': True,
'groups': ['hadoop', 'hdfs']}
2016-12-14 06:12:10,985 - FS Type:
2016-12-14 06:12:10,985 - Directory['/etc/hadoop'] {'mode': 0755}
2016-12-14 06:12:11,068 -
File['/usr/hdp/current/hadoop-client/conf/hadoop-env.sh'] {'content':
InlineTemplate(...), 'owner': 'root', 'group': 'hadoop'}
2016-12-14 06:12:11,192 -
Directory['/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir'] {'owner': 'hdfs',
'group': 'hadoop', 'mode': 01777}
2016-12-14 06:12:11,296 - Execute[('setenforce', '0')] {'not_if': '(! which
getenforce ) || (which getenforce && getenforce | grep -q Disabled)', 'sudo':
True, 'only_if': 'test -f /selinux/enforce'}
2016-12-14 06:12:11,317 - Skipping Execute[('setenforce', '0')] due to not_if
2016-12-14 06:12:11,317 - Directory['/grid/0/log/hdfs'] {'owner': 'root',
'create_parents': True, 'group': 'hadoop', 'mode': 0775, 'cd_access': 'a'}
2016-12-14 06:12:11,603 - Directory['/grid/0/pid/hdfs'] {'owner': 'root',
'create_parents': True, 'group': 'root', 'cd_access': 'a'}
2016-12-14 06:12:11,671 - Changing owner for /grid/0/pid/hdfs from 1021 to root
2016-12-14 06:12:11,671 - Changing group for /grid/0/pid/hdfs from 1006 to root
2016-12-14 06:12:11,861 - Directory['/tmp/hadoop-hdfs'] {'owner': 'hdfs',
'create_parents': True, 'cd_access': 'a'}
2016-12-14 06:12:12,019 -
File['/usr/hdp/current/hadoop-client/conf/commons-logging.properties']
{'content': Template('commons-logging.properties.j2'), 'owner': 'root'}
2016-12-14 06:12:12,143 -
File['/usr/hdp/current/hadoop-client/conf/health_check'] {'content':
Template('health_check.j2'), 'owner': 'root'}
2016-12-14 06:12:12,248 -
File['/usr/hdp/current/hadoop-client/conf/log4j.properties'] {'content': ...,
'owner': 'hdfs', 'group': 'hadoop', 'mode': 0644}
2016-12-14 06:12:12,380 -
File['/usr/hdp/current/hadoop-client/conf/hadoop-metrics2.properties']
{'content': InlineTemplate(...), 'owner': 'hdfs', 'group': 'hadoop'}
2016-12-14 06:12:12,482 -
File['/usr/hdp/current/hadoop-client/conf/task-log4j.properties'] {'content':
StaticFile('task-log4j.properties'), 'mode': 0755}
2016-12-14 06:12:12,597 -
File['/usr/hdp/current/hadoop-client/conf/configuration.xsl'] {'owner': 'hdfs',
'group': 'hadoop'}
2016-12-14 06:12:12,672 - File['/etc/hadoop/conf/topology_mappings.data']
{'owner': 'hdfs', 'content': Template('topology_mappings.data.j2'), 'only_if':
'test -d /etc/hadoop/conf', 'group': 'hadoop'}
2016-12-14 06:12:12,823 - File['/etc/hadoop/conf/topology_script.py']
{'content': StaticFile('topology_script.py'), 'only_if': 'test -d
/etc/hadoop/conf', 'mode': 0755}
2016-12-14 06:12:13,461 - Using hadoop conf dir:
/usr/hdp/current/hadoop-client/conf
2016-12-14 06:12:13,466 - checked_call['hostid'] {}
2016-12-14 06:12:13,485 - checked_call returned (0, '1bac0d12')
2016-12-14 06:12:13,488 - Directory['/etc/ambari-metrics-monitor/conf']
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True}
2016-12-14 06:12:13,581 - Directory['/grid/0/log/metric_monitor'] {'owner':
'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755}
2016-12-14 06:12:13,693 - Directory['/grid/0/pid/metric_monitor'] {'owner':
'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'mode': 0755,
'cd_access': 'a'}
2016-12-14 06:12:13,971 -
Directory['/usr/lib/python2.6/site-packages/resource_monitoring/psutil/build']
{'owner': 'cstm-ams', 'group': 'hadoop', 'create_parents': True, 'cd_access':
'a'}
2016-12-14 06:12:14,387 - Execute['ambari-sudo.sh chown -R cstm-ams:hadoop
/usr/lib/python2.6/site-packages/resource_monitoring'] {}
2016-12-14 06:12:14,411 -
TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'owner':
'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
2016-12-14 06:12:14,421 -
File['/etc/ambari-metrics-monitor/conf/metric_monitor.ini'] {'content':
Template('metric_monitor.ini.j2'), 'owner': 'cstm-ams', 'group': 'hadoop',
'mode': None}
2016-12-14 06:12:14,549 -
TemplateConfig['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'owner':
'cstm-ams', 'template_tag': None, 'group': 'hadoop'}
2016-12-14 06:12:14,551 -
File['/etc/ambari-metrics-monitor/conf/metric_groups.conf'] {'content':
Template('metric_groups.conf.j2'), 'owner': 'cstm-ams', 'group': 'hadoop',
'mode': None}
2016-12-14 06:12:14,672 - File['/etc/ambari-metrics-monitor/conf/ams-env.sh']
{'content': InlineTemplate(...), 'owner': 'cstm-ams'}
2016-12-14 06:12:14,814 - Execute['/usr/sbin/ambari-metrics-monitor --config
/etc/ambari-metrics-monitor/conf start'] {'user': 'cstm-ams'}
2016-12-14 06:12:16,884 - Execute['find /grid/0/log/metric_monitor -maxdepth 1
-type f -name '*' -exec echo '==> {} <==' \; -exec tail -n 40 {} \;']
{'logoutput': True, 'ignore_failures': True, 'user': 'cstm-ams'}
######## Hortonworks #############
This is MOTD message, added for testing in qe infra
==> /grid/0/log/metric_monitor/ambari-metrics-monitor.out <==
2016-12-14 05:35:21,946 [ERROR] host_info.py:194 - Failed to read disk_usage
for a mountpoint : [Errno 13] Permission denied:
'/ycloud-grid/0/hadoop/yarn/local/usercache/root/appcache/application_1481604818073_0640/container_e83_1481604818073_0640_01_000007'
2016-12-14 05:35:27,256 [INFO] emitter.py:152 - Calculated collector shard
based on hostname : ctr-e83-1481604818073-0640-01-000006.hwx.site
{noformat}
NOTE: During cluster initial installation, AMS was installed as user ams, but
while re-adding AMS, it was added as custom user (cstm-ams)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)