[
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated TEZ-3102:
----------------------------
Attachment: TEZ-3102.002.patch
Sorry for the late reply, I was out on vacation.
Ah, yes, I somehow missed the successfulAttempt check when I looked at it. I
updated the patch to reuse the AttemptKilledTransition logic for both the
successful and unsuccessful attempt paths in the retroactive killed case.
> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and
> the other killed. Then if the task retroactively fails due to fetch failures
> the Tez AM will fail to reschedule another task. This results in a hung job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)