[
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated TEZ-3102:
----------------------------
Attachment: TEZ-3102.001.patch
Attaching a patch that does sufficient processing of the kill event for the
task that lost the speculation race to prevent the task state machine from
thinking it still has outstanding attempts.
> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: TEZ-3102.001.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and
> the other killed. Then if the task retroactively fails due to fetch failures
> the Tez AM will fail to reschedule another task. This results in a hung job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)