[ 
https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645285#comment-14645285
 ] 

Hitesh Shah edited comment on TEZ-2311 at 7/29/15 12:44 AM:
------------------------------------------------------------

bq. The purpose is to reuse the code in 
KillNewJobTransition/KillInitedJobTransition/DAGKilledTransition such 
setTerminationCause.

Agreed - would be good. However, I am not sure about all the potential 
conditions that can arise given that the summary log will exist but the 
recovery log may be lagging behind hence the potential safer approach to just 
move to desired state.


was (Author: hitesh):
bq. The purpose is to reuse the code in 
KillNewJobTransition/KillInitedJobTransition/DAGKilledTransition such 
setTerminationCause.

Agreed - would be good. However, I am not sure about all the potential 
conditions that can arise given that the summary log will exist but the 
recovery log may be lagging behind.

> AM can hang if kill received while recovering from previous attempt
> -------------------------------------------------------------------
>
>                 Key: TEZ-2311
>                 URL: https://issues.apache.org/jira/browse/TEZ-2311
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.0
>            Reporter: Jason Lowe
>            Assignee: Jeff Zhang
>              Labels: Recovery
>         Attachments: TEZ-2311-1.patch, TEZ-2311-2.patch, TEZ-2311-3.patch
>
>
> We saw an instance of a Tez job hanging despite receiving multiple kill 
> requests from clients.  The AM was recovering from a prior attempt when the 
> first kill request arrived.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to