Yesha Vora created AMBARI-19930:
-----------------------------------
Summary: The service check status was set to TIMEOUT even if
service check was failed
Key: AMBARI-19930
URL: https://issues.apache.org/jira/browse/AMBARI-19930
Project: Ambari
Issue Type: Bug
Reporter: Yesha Vora
Steps to reproduce:
* Install a cluster with Hadoop, Tez, Hbase , Hive, Spark
* Enable Wire encryption
* Run Tez service check
Here, agent.service.check.task.timeout is set to 600 sec. Tez application was
started in background. The service check then tries to find out SUCCESS file
for couple of minutes only. In this particular instance, the application took 5
minutes to run. Thus, the check for SUCCESS file on HDFS failed.
In this scenario, the status for service check should be failed instead Timeout.
{code}
stderr: /var/lib/ambari-agent/data/errors-370.txt
stdout: /var/lib/ambari-agent/data/output-370.txt
2017-02-08 03:55:55,017 - HdfsResource['/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz']
{'security_enabled': True, 'hadoop_bin_dir':
'/usr/hdp/current/hadoop-client/bin', 'keytab':
'/etc/security/keytabs/hdfs.headless.keytab', 'source':
'/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz', 'dfs_type': '', 'default_fs':
'hdfs://host:8020', 'replace_existing_files': False,
'hdfs_resource_ignore_file':
'/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ...,
'kinit_path_local': '/usr/bin/kinit', 'principal_name': '[email protected]',
'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir':
'/usr/hdp/current/hadoop-client/conf', 'type': 'file', 'action':
['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse',
u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0444}
2017-02-08 03:55:55,017 - Execute['/usr/bin/kinit -kt
/etc/security/keytabs/hdfs.headless.keytab [email protected]'] {'user': 'hdfs'}
2017-02-08 03:55:55,096 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl
-sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : -k
'"'"'https://host:50470/webhdfs/v1/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz?op=GETFILESTATUS&user.name=hdfs'"'"'
1>/tmp/tmpoIadeN 2>/tmp/tmp6nFiLj''] {'logoutput': None, 'quiet': False}
2017-02-08 03:55:55,292 - call returned (0, '')
2017-02-08 03:55:55,293 - DFS file /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz is
identical to /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz, skipping the copying
2017-02-08 03:55:55,293 - Will attempt to copy tez tarball from
/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz to DFS at
/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz.
2017-02-08 03:55:55,293 - HdfsResource[None] {'security_enabled': True,
'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab':
'/etc/security/keytabs/hdfs.headless.keytab', 'dfs_type': '', 'default_fs':
'hdfs://host:8020', 'hdfs_resource_ignore_file':
'/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ...,
'kinit_path_local': '/usr/bin/kinit', 'principal_name': '[email protected]',
'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir':
'/usr/hdp/current/hadoop-client/conf', 'immutable_paths':
[u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']}
2017-02-08 03:55:55,294 - Execute['/usr/bin/kinit -kt
/etc/security/keytabs/smokeuser.headless.keytab [email protected];']
{'user': 'ambari-qa'}
2017-02-08 03:55:55,389 - ExecuteHadoop['jar
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5,
'tries': 3, 'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'user':
'ambari-qa', 'conf_dir': '/usr/hdp/current/hadoop-client/conf'}
2017-02-08 03:55:55,390 - Execute['hadoop --config
/usr/hdp/current/hadoop-client/conf jar
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None,
'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path':
['/usr/hdp/current/hadoop-client/bin']}{code}
{code}
Requests: {
aborted_task_count: 0,
cluster_name: "cl1",
completed_task_count: 1,
create_time: 1486526151743,
end_time: 1486526463038,
exclusive: false,
failed_task_count: 0,
id: 29,
inputs: "{}",
operation_level: null,
progress_percent: 100,
queued_task_count: 0,
request_context: "WE API TEZ Service Check",
request_schedule: null,
request_status: "TIMEDOUT",
resource_filters: [
{
service_name: "TEZ"
}
],
start_time: 1486526151751,
task_count: 1,
timed_out_task_count: 1,
type: "COMMAND"
},{code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)