[
https://issues.apache.org/jira/browse/TEZ-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645285#comment-14645285
]
Hitesh Shah edited comment on TEZ-2311 at 7/29/15 12:44 AM:
------------------------------------------------------------
bq. The purpose is to reuse the code in
KillNewJobTransition/KillInitedJobTransition/DAGKilledTransition such
setTerminationCause.
Agreed - would be good. However, I am not sure about all the potential
conditions that can arise given that the summary log will exist but the
recovery log may be lagging behind hence the potential safer approach to just
move to desired state.
was (Author: hitesh):
bq. The purpose is to reuse the code in
KillNewJobTransition/KillInitedJobTransition/DAGKilledTransition such
setTerminationCause.
Agreed - would be good. However, I am not sure about all the potential
conditions that can arise given that the summary log will exist but the
recovery log may be lagging behind.
> AM can hang if kill received while recovering from previous attempt
> -------------------------------------------------------------------
>
> Key: TEZ-2311
> URL: https://issues.apache.org/jira/browse/TEZ-2311
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.0
> Reporter: Jason Lowe
> Assignee: Jeff Zhang
> Labels: Recovery
> Attachments: TEZ-2311-1.patch, TEZ-2311-2.patch, TEZ-2311-3.patch
>
>
> We saw an instance of a Tez job hanging despite receiving multiple kill
> requests from clients. The AM was recovering from a prior attempt when the
> first kill request arrived.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)