[
https://issues.apache.org/jira/browse/TEZ-808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jason Lowe updated TEZ-808:
---------------------------
Attachment: TEZ-808.branch-0.7.patch
Would it be possible to backport this to branch-0.7? We're going to be on 0.7
for a while, and we'd like this fix (along with TEZ-2918) to be able to catch
hung tasks in production and automatically recover.
Attaching a version of the patch for branch-0.7. It came over fairly cleanly.
> Handle task attempts that are not making progress
> -------------------------------------------------
>
> Key: TEZ-808
> URL: https://issues.apache.org/jira/browse/TEZ-808
> Project: Apache Tez
> Issue Type: Sub-task
> Reporter: Bikas Saha
> Assignee: Bikas Saha
> Fix For: 0.8.2
>
> Attachments: TEZ-808.1.patch, TEZ-808.2.patch, TEZ-808.3.patch,
> TEZ-808.branch-0.7.patch
>
>
> If a task attempt is not making progress then it may cause the job to hang.
> We may want to kill and restart the attempt. With speculation support and
> free resources we may want to run another version in parallel.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)