[
https://issues.apache.org/jira/browse/HADOOP-5131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12668331#action_12668331
]
Devaraj Das commented on HADOOP-5131:
-------------------------------------
Actually, I think there is a problem with the patch in MapOutputCopier.run().
The affected code is -
{code}
synchronized (scheduledCopies) {
while (scheduledCopies.isEmpty()) {
scheduledCopies.wait();
}
}
loc = scheduledCopies.remove(0);
{code}
The line {noformat}scheduledCopies.remove(0){noformat} should be inside the
{noformat}synchronized(scheduledCopies){noformat} block. Otherwise there is a
race condition (assume an object got inserted in the scheduledCopies list and
two copier threads got notified) - thread T1 gets a notification, wakes up, but
just before it does scheduledCopies.remove(0), there is a context switch to
thread T2. Thread T2 also finds the scheduledCopies to be non-empty and goes
ahead and makes a successful call to scheduledCopies.remove(0). Now when T1
gets to run again, it would get a null for the remove(0) call. I think we
should prevent this from happening..
> ReduceCopier sleeps for a hardcoded interval of 5secs
> -----------------------------------------------------
>
> Key: HADOOP-5131
> URL: https://issues.apache.org/jira/browse/HADOOP-5131
> Project: Hadoop Core
> Issue Type: Bug
> Components: mapred
> Affects Versions: 0.20.0
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Attachments: HADOOP-5131_0_20090127.patch
>
>
> Currently the ReduceCopier.run has a hard-coded 5s sleep which hurts jobs
> where latency is important, we really should have a mechanism where the
> thread fetching task-completion events should notify the ReduceCopier.run.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.