Yesha Vora created AMBARI-19930:
-----------------------------------

             Summary: The service check status was set to TIMEOUT even if 
service check was failed
                 Key: AMBARI-19930
                 URL: https://issues.apache.org/jira/browse/AMBARI-19930
             Project: Ambari
          Issue Type: Bug
            Reporter: Yesha Vora


Steps to reproduce:
* Install a cluster with Hadoop, Tez, Hbase , Hive, Spark
* Enable Wire encryption
* Run Tez service check

Here, agent.service.check.task.timeout is set to 600 sec. Tez application was 
started in background. The service check then  tries to find out SUCCESS file 
for couple of minutes only. In this particular instance, the application took 5 
minutes to run. Thus, the check for SUCCESS file on HDFS failed. 

In this scenario, the status for service check should be failed instead Timeout.

{code}
stderr:   /var/lib/ambari-agent/data/errors-370.txt

stdout:   /var/lib/ambari-agent/data/output-370.txt

2017-02-08 03:55:55,017 - HdfsResource['/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz'] 
{'security_enabled': True, 'hadoop_bin_dir': 
'/usr/hdp/current/hadoop-client/bin', 'keytab': 
'/etc/security/keytabs/hdfs.headless.keytab', 'source': 
'/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz', 'dfs_type': '', 'default_fs': 
'hdfs://host:8020', 'replace_existing_files': False, 
'hdfs_resource_ignore_file': 
'/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 
'kinit_path_local': '/usr/bin/kinit', 'principal_name': '[email protected]', 
'user': 'hdfs', 'owner': 'hdfs', 'group': 'hadoop', 'hadoop_conf_dir': 
'/usr/hdp/current/hadoop-client/conf', 'type': 'file', 'action': 
['create_on_execute'], 'immutable_paths': [u'/apps/hive/warehouse', 
u'/mr-history/done', u'/app-logs', u'/tmp'], 'mode': 0444}
2017-02-08 03:55:55,017 - Execute['/usr/bin/kinit -kt 
/etc/security/keytabs/hdfs.headless.keytab [email protected]'] {'user': 'hdfs'}
2017-02-08 03:55:55,096 - call['ambari-sudo.sh su hdfs -l -s /bin/bash -c 'curl 
-sS -L -w '"'"'%{http_code}'"'"' -X GET --negotiate -u : -k 
'"'"'https://host:50470/webhdfs/v1/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz?op=GETFILESTATUS&user.name=hdfs'"'"'
 1>/tmp/tmpoIadeN 2>/tmp/tmp6nFiLj''] {'logoutput': None, 'quiet': False}
2017-02-08 03:55:55,292 - call returned (0, '')
2017-02-08 03:55:55,293 - DFS file /hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz is 
identical to /usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz, skipping the copying
2017-02-08 03:55:55,293 - Will attempt to copy tez tarball from 
/usr/hdp/2.6.0.0-xxx/tez/lib/tez.tar.gz to DFS at 
/hdp/apps/2.6.0.0-xxx/tez/tez.tar.gz.
2017-02-08 03:55:55,293 - HdfsResource[None] {'security_enabled': True, 
'hadoop_bin_dir': '/usr/hdp/current/hadoop-client/bin', 'keytab': 
'/etc/security/keytabs/hdfs.headless.keytab', 'dfs_type': '', 'default_fs': 
'hdfs://host:8020', 'hdfs_resource_ignore_file': 
'/var/lib/ambari-agent/data/.hdfs_resource_ignore', 'hdfs_site': ..., 
'kinit_path_local': '/usr/bin/kinit', 'principal_name': '[email protected]', 
'user': 'hdfs', 'action': ['execute'], 'hadoop_conf_dir': 
'/usr/hdp/current/hadoop-client/conf', 'immutable_paths': 
[u'/apps/hive/warehouse', u'/mr-history/done', u'/app-logs', u'/tmp']}
2017-02-08 03:55:55,294 - Execute['/usr/bin/kinit -kt 
/etc/security/keytabs/smokeuser.headless.keytab [email protected];'] 
{'user': 'ambari-qa'}
2017-02-08 03:55:55,389 - ExecuteHadoop['jar 
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount 
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'try_sleep': 5, 
'tries': 3, 'bin_dir': '/usr/hdp/current/hadoop-client/bin', 'user': 
'ambari-qa', 'conf_dir': '/usr/hdp/current/hadoop-client/conf'}
2017-02-08 03:55:55,390 - Execute['hadoop --config 
/usr/hdp/current/hadoop-client/conf jar 
/usr/hdp/current/tez-client/tez-examples*.jar orderedwordcount 
/tmp/tezsmokeinput/sample-tez-test /tmp/tezsmokeoutput/'] {'logoutput': None, 
'try_sleep': 5, 'environment': {}, 'tries': 3, 'user': 'ambari-qa', 'path': 
['/usr/hdp/current/hadoop-client/bin']}{code}

{code}
Requests: {
aborted_task_count: 0,
cluster_name: "cl1",
completed_task_count: 1,
create_time: 1486526151743,
end_time: 1486526463038,
exclusive: false,
failed_task_count: 0,
id: 29,
inputs: "{}",
operation_level: null,
progress_percent: 100,
queued_task_count: 0,
request_context: "WE API TEZ Service Check",
request_schedule: null,
request_status: "TIMEDOUT",
resource_filters: [
{
service_name: "TEZ"
}
],
start_time: 1486526151751,
task_count: 1,
timed_out_task_count: 1,
type: "COMMAND"
},{code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to