[
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114514#comment-16114514
]
Kuhu Shukla commented on TEZ-3803:
----------------------------------
Thank you [~jlowe].
bq. We get the return code from await but ignore it. I don't think we need to
assign it since we don't care why we woke up given the while condition will
retest.
Sorry I did not bring this up sooner, I added this to get rid of the findbugs
warning that the status is not being used anywhere. The assignment removes that
warning and I can ignore the findbugs warning if we want that.
bq. Why the explicit ShuffleScheduler.this qualification in
waitAndNotifyProgress? It wasn't originally qualified, and I'm not seeing the
need to do it here.
Since ShuffleSchedulerCallable#callInternal uses the same
waitandNotifyProgress() call now and qualifies the wait object in the original
ShuffleScheduler.this.wait(), I used that so that the wait in the
ShuffleSchedulerCallable remains the same. Please correct me if I am wrong.
> Tasks can get killed due to insufficient progress while waiting for shuffle
> inputs to complete
> ----------------------------------------------------------------------------------------------
>
> Key: TEZ-3803
> URL: https://issues.apache.org/jira/browse/TEZ-3803
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Kuhu Shukla
> Assignee: Kuhu Shukla
> Priority: Critical
> Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch,
> TEZ-3803.003.patch, TEZ-3803.004.patch
>
>
> In a scenario where a downstream task has no slow start and gets started
> before all its shuffle inputs are done, the task can timeout as the wait does
> not notify progress( set the "progress is being made bit") like it does in
> MapReduce.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)