[
https://issues.apache.org/jira/browse/TEZ-160?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16313799#comment-16313799
]
Rohini Palaniswamy commented on TEZ-160:
----------------------------------------
Recently ran noticed that about 5% of Pig jobs launched from Oozie in a
cluster, had application status as KILLED even though the DAG succeeded and Pig
scripts completed successfully. This was because Pig calls TezClient.stop() on
shutdown. If it is not killed within 10 seconds, it calls
frameworkClient.killApplication(sessionAppId); which kill the AM. Because of
the sleep time of 5 seconds after shutdown is issued, an application finishing
as SUCCEEDED or KILLED depended on whether the shutdown completed within the
next 5 seconds.
Can we skip this check if it is a user initiated shutdown or at least lower it
to 1 or 2 seconds? In case of Pig it is a Tez session and pig client is calling
shutdown. I think we can skip it in general if it was a Tez session. The only
time it will go down automatically is if session timeout expires. Adding
another 5 seconds in that case is also wasteful.
> Remove 5 second sleep at the end of AM completion.
> --------------------------------------------------
>
> Key: TEZ-160
> URL: https://issues.apache.org/jira/browse/TEZ-160
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Siddharth Seth
> Labels: TEZ-0.2.0
> Attachments: test.timeouts.txt
>
>
> ClientServiceDelegate/DAGClient doesn't seem to be getting job completion
> status from the AM after job completion. It, instead, always relies on the RM
> for this information. The information returned by the AM should be used while
> it's available.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)