[ 
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137056#comment-15137056
 ] 

Jason Lowe commented on TEZ-3102:
---------------------------------

Hang occurs because TaskImpl.shouldScheduleNewAttempt returns false as it 
believes there is still an uncompleted task attempt. That occurs because when 
the speculative task attempt was killed, the TaskImpl state machine ignored the 
kill event and did not decrement the number of outstanding attempts.

> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
>                 Key: TEZ-3102
>                 URL: https://issues.apache.org/jira/browse/TEZ-3102
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jason Lowe
>            Assignee: Jason Lowe
>            Priority: Critical
>
> If a task speculates then succeeds, one task will be marked successful and 
> the other killed. Then if the task retroactively fails due to fetch failures 
> the Tez AM will fail to reschedule another task. This results in a hung job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to