[ 
https://issues.apache.org/jira/browse/HADOOP-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668331#action_12668331
 ] 

Devaraj Das commented on HADOOP-5131:
-------------------------------------

Actually, I think there is a problem with the patch in MapOutputCopier.run(). 
The affected code is -
{code}  
synchronized (scheduledCopies) {
   while (scheduledCopies.isEmpty()) {
      scheduledCopies.wait();
   }
}
loc = scheduledCopies.remove(0);
{code}
The line {noformat}scheduledCopies.remove(0){noformat} should be inside the 
{noformat}synchronized(scheduledCopies){noformat} block. Otherwise there is a 
race condition (assume an object got inserted in the scheduledCopies list and 
two copier threads got notified) - thread T1 gets a notification, wakes up, but 
just before it does scheduledCopies.remove(0), there is a context switch to 
thread T2. Thread T2 also finds the scheduledCopies to be non-empty and goes 
ahead and makes a successful call to scheduledCopies.remove(0). Now when T1 
gets to run again, it would get a null for the remove(0) call. I think we 
should prevent this from happening..

> ReduceCopier sleeps for a hardcoded interval of 5secs
> -----------------------------------------------------
>
>                 Key: HADOOP-5131
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5131
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.20.0
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>         Attachments: HADOOP-5131_0_20090127.patch
>
>
> Currently the ReduceCopier.run has a hard-coded 5s sleep which hurts jobs 
> where latency is important, we really should have a mechanism where the 
> thread fetching task-completion events should notify the ReduceCopier.run.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to