[ 
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114514#comment-16114514
 ] 

Kuhu Shukla commented on TEZ-3803:
----------------------------------

Thank you [~jlowe].
bq. We get the return code from await but ignore it. I don't think we need to 
assign it since we don't care why we woke up given the while condition will 
retest.
Sorry I did not bring this up sooner, I added this to get rid of the findbugs 
warning that the status is not being used anywhere. The assignment removes that 
warning and I can ignore the findbugs warning if we want that.
bq. Why the explicit ShuffleScheduler.this qualification in 
waitAndNotifyProgress? It wasn't originally qualified, and I'm not seeing the 
need to do it here.
Since ShuffleSchedulerCallable#callInternal uses the same 
waitandNotifyProgress() call now and qualifies the wait object in the original  
ShuffleScheduler.this.wait(), I used that so that the wait in the 
ShuffleSchedulerCallable remains the same. Please correct me if I am wrong.


> Tasks can get killed due to insufficient progress while waiting for shuffle 
> inputs to complete
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3803
>                 URL: https://issues.apache.org/jira/browse/TEZ-3803
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Critical
>         Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch, 
> TEZ-3803.003.patch, TEZ-3803.004.patch
>
>
> In a scenario where a downstream task has no slow start and gets started 
> before all its shuffle inputs are done, the task can timeout as the wait does 
> not notify progress( set the "progress is being made bit") like it does in 
> MapReduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to