[
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Kuhu Shukla updated TEZ-3803:
-----------------------------
Attachment: TEZ-3803.004.patch
Revised patch that waits with a set timeout (un-configurable) for simplicity.
We could go to a different value or move this as a variable to ShuffleUtils if
this approach seems ok. Also changed the test run time. This patch modifies the
if block to a while block essentially. Will wait for precommit before further
review requests.
> Tasks can get killed due to insufficient progress while waiting for shuffle
> inputs to complete
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-3803
> URL: https://issues.apache.org/jira/browse/TEZ-3803
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Priority: Critical
> Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch,
> TEZ-3803.003.patch, TEZ-3803.004.patch
>
>
> In a scenario where a downstream task has no slow start and gets started
> before all its shuffle inputs are done, the task can timeout as the wait does
> not notify progress( set the "progress is being made bit") like it does in
> MapReduce.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)