[ 
https://issues.apache.org/jira/browse/TEZ-3803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16114494#comment-16114494
 ] 

Jason Lowe commented on TEZ-3803:
---------------------------------

Thanks for updating the patch!  Looks much better, just nits at this point.

The code within the while block has an indentation of 4 instead of 2 which 
makes it harder to read since it lines up with the continuation of the 
conditional rather than standing out on its own as a block start should.
{code}
          while ((runningFetchers.size() >= numFetchers || 
pendingHosts.isEmpty())
              && numCompletedInputs.get() < numInputs) {
              inputContext.notifyProgress();
              boolean ret = wakeLoop.await(1000, TimeUnit.MILLISECONDS);
          }
{code}

We get the return code from await but ignore it.  I don't think we need to 
assign it since we don't care why we woke up given the while condition will 
retest.

waitAndNotifyProgress should be private.

Why the explicit ShuffleScheduler.this qualification in waitAndNotifyProgress?  
It wasn't originally qualified, and I'm not seeing the need to do it here.


> Tasks can get killed due to insufficient progress while waiting for shuffle 
> inputs to complete
> ----------------------------------------------------------------------------------------------
>
>                 Key: TEZ-3803
>                 URL: https://issues.apache.org/jira/browse/TEZ-3803
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Kuhu Shukla
>            Assignee: Kuhu Shukla
>            Priority: Critical
>         Attachments: TEZ-3803.001.patch, TEZ-3803.002.patch, 
> TEZ-3803.003.patch, TEZ-3803.004.patch
>
>
> In a scenario where a downstream task has no slow start and gets started 
> before all its shuffle inputs are done, the task can timeout as the wait does 
> not notify progress( set the "progress is being made bit") like it does in 
> MapReduce.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to