[
https://issues.apache.org/jira/browse/TEZ-3102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated TEZ-3102:
----------------------------
Attachment: TEZ-3102.003.patch
Thanks for the reviews, Bikas!
testTaskSucceedAndRetroActiveFailure doesn't cover the change since it's using
the failed transition rather than the killed transition, so I added a test that
explicitly kills a successful attempt to verify it reverts back to scheduling a
new attempt.
The reported test failures appear to be unrelated, as they pass for me locally.
> Fetch failure of a speculated task causes job hang
> --------------------------------------------------
>
> Key: TEZ-3102
> URL: https://issues.apache.org/jira/browse/TEZ-3102
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jason Lowe
> Assignee: Jason Lowe
> Priority: Critical
> Attachments: TEZ-3102.001.patch, TEZ-3102.002.patch,
> TEZ-3102.003.patch
>
>
> If a task speculates then succeeds, one task will be marked successful and
> the other killed. Then if the task retroactively fails due to fetch failures
> the Tez AM will fail to reschedule another task. This results in a hung job.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)