[ 
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603080#action_12603080
 ] 

Arun C Murthy commented on HADOOP-3366:
---------------------------------------

bq. 0) The inMem merge thread needs to ignore the criteria when the shuffle 
thread notifies it to do a forced merge.
The shuffle thread cannot _force_ a merge, the merge is still triggered only 
when the various criteria are satisfied - the notification only wakes up the 
possibly waiting merge-thread.

bq. 1) A race condition exists in the interval between the ramManager.notify 
and mergePassComplete.wait() calls in getMapOutput. What could happen is that 
the ramManager gets notified and it finishes the merge before this thread calls 
mergePassComplete.wait(). If this happens the notification from the merger is 
lost and this thread will just wait ...
Even if that notification is lost, the merge-thread always allows the shuffle 
threads to progress by doing a 'mergeProgress.notifyAll' _before_ it sleeps - 
thereby preventing any deadlocks.

{quote}
2) The handshake between the merger, copier and the ramManager looks complex 
and there could be more race conditions like the one i pointed above. I and 
Sharad had a quick discussion and we feel it can be simplified.
Have the ramManager.reserve lock the thread if the request cannot be satisfied
Have the ramManager.unreserve do a notifyAll (this the mergeThread does)
Have the shuffle thread notify the mergeThread (before it goes to wait) 
{quote}

I agree, it is complicated. However it has a couple of important points:
1. RamManager.reserve cannot lock the thread without closing the http 
connection, doing so would leak the shuffle into the RamManager where you'd 
have to pass the HTTP input-stream to RamManager.reserve.
2. It is much better to do a 'notifyAll' on the shuffle threads when the 
_merge_ is complete, so one is reasonably sure that _all_ shuffle threads can 
progress. Doing it in RamManager.unreserve would let only one shuffle thread 
through at a time and the contention for the lock would be very high - every 
thread will wake up, and get lockedup inside the RamManager again.

> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>         Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch, 
> HADOOP-3366_0_20080605.patch, ifile.patch
>
>
> This is intended to be a meta-issue to track various improvements to 
> shuffle/merge in the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to