[ 
https://issues.apache.org/jira/browse/TEZ-796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bikas Saha updated TEZ-796:
---------------------------

    Attachment: TEZ-796.1.patch

Race condition in which dag failure causes vertex to get killed -> causes its 
tasks to get killed -> which puts running tasks in kill_wait state waiting for 
the attempts to get killed. However, for one task the attempt has just 
succeeded and the task ignores attempt succeeded event in the kill_wait state. 
Thus its hangs forever in the kill_wait state because the attempt is not going 
to get killed.
Adding fix and tests.

> AM Hangs & does not kill containers when map-task fails
> -------------------------------------------------------
>
>                 Key: TEZ-796
>                 URL: https://issues.apache.org/jira/browse/TEZ-796
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.3.0
>            Reporter: Gopal V
>            Assignee: Bikas Saha
>         Attachments: TEZ-796.1.patch, last_dag_error.txt.gz
>
>
> The task hangs after a vertex fails with an error and continues to idle 
> without returning an error.
> Looks like reducer spinups continue even after a vertex fails.
> {code}
> ask_1389745080241_1218_6_04_000974,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000976,KILL_WAIT,KILLED
> task_1389745080241_1218_6_09_000270,SCHEDULED,RUNNING
> task_1389745080241_1218_6_04_000978,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000979,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000980,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000981,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000982,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000984,KILL_WAIT,KILLED
> task_1389745080241_1218_6_04_000987,KILL_WAIT,KILLED
> {code}
> In the attached log file, the map vertex fails at 10:59 and DAG_FINISHED is 
> not triggered till 11:03, at which point the AM was killed by hand.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to