[ 
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313799#comment-16313799
 ] 

Rohini Palaniswamy commented on TEZ-160:
----------------------------------------

Recently ran noticed that about 5% of Pig jobs launched from Oozie in a 
cluster, had application status as KILLED even though the DAG succeeded and Pig 
scripts completed successfully. This was because Pig calls TezClient.stop() on 
shutdown. If it is not killed within 10 seconds, it calls 
frameworkClient.killApplication(sessionAppId); which kill the AM. Because of 
the sleep time of 5 seconds after shutdown is issued, an application finishing 
as SUCCEEDED or KILLED depended on whether the shutdown completed within the 
next 5 seconds. 

Can we skip this check if it is a user initiated shutdown or at least lower it 
to 1 or 2 seconds? In case of Pig it is a Tez session and pig client is calling 
shutdown. I think we can skip it in general if it was a Tez session. The only 
time it will go down automatically is if session timeout expires. Adding 
another 5 seconds in that case is also wasteful.

> Remove 5 second sleep at the end of AM completion.
> --------------------------------------------------
>
>                 Key: TEZ-160
>                 URL: https://issues.apache.org/jira/browse/TEZ-160
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Siddharth Seth
>              Labels: TEZ-0.2.0
>         Attachments: test.timeouts.txt
>
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion 
> status from the AM after job completion. It, instead, always relies on the RM 
> for this information. The information returned by the AM should be used while 
> it's available.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to