[
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603080#action_12603080
]
Arun C Murthy commented on HADOOP-3366:
---------------------------------------
bq. 0) The inMem merge thread needs to ignore the criteria when the shuffle
thread notifies it to do a forced merge.
The shuffle thread cannot _force_ a merge, the merge is still triggered only
when the various criteria are satisfied - the notification only wakes up the
possibly waiting merge-thread.
bq. 1) A race condition exists in the interval between the ramManager.notify
and mergePassComplete.wait() calls in getMapOutput. What could happen is that
the ramManager gets notified and it finishes the merge before this thread calls
mergePassComplete.wait(). If this happens the notification from the merger is
lost and this thread will just wait ...
Even if that notification is lost, the merge-thread always allows the shuffle
threads to progress by doing a 'mergeProgress.notifyAll' _before_ it sleeps -
thereby preventing any deadlocks.
{quote}
2) The handshake between the merger, copier and the ramManager looks complex
and there could be more race conditions like the one i pointed above. I and
Sharad had a quick discussion and we feel it can be simplified.
Have the ramManager.reserve lock the thread if the request cannot be satisfied
Have the ramManager.unreserve do a notifyAll (this the mergeThread does)
Have the shuffle thread notify the mergeThread (before it goes to wait)
{quote}
I agree, it is complicated. However it has a couple of important points:
1. RamManager.reserve cannot lock the thread without closing the http
connection, doing so would leak the shuffle into the RamManager where you'd
have to pass the HTTP input-stream to RamManager.reserve.
2. It is much better to do a 'notifyAll' on the shuffle threads when the
_merge_ is complete, so one is reasonably sure that _all_ shuffle threads can
progress. Doing it in RamManager.unreserve would let only one shuffle thread
through at a time and the contention for the lock would be very high - every
thread will wake up, and get lockedup inside the RamManager again.
> Shuffle/Merge improvements
> --------------------------
>
> Key: HADOOP-3366
> URL: https://issues.apache.org/jira/browse/HADOOP-3366
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 0.18.0
>
> Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch,
> HADOOP-3366_0_20080605.patch, ifile.patch
>
>
> This is intended to be a meta-issue to track various improvements to
> shuffle/merge in the reducer.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.