[ 
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603113#action_12603113
 ] 

Devaraj Das commented on HADOOP-3366:
-------------------------------------

Take this scenario: 
1) Merge is currently in progress. 
2) Shuffle-thread-1 has set stallShuffle=true but hasn't done 
mergePassComplete.wait() and it got preempted 
3) All other Shuffle-threads will now block at shuffle.wait (the 
ramManager.notify won't have effect since the merge is already in progress) 
4) Merge completes and invokes mergePassComplete.notifyAll. It then goes back 
to ramManager.wait. 
5) Since shuffle-thread-1 hasn't done mergePassComplete.wait, the notification 
that the merge thread sent is lost and all the shuffle threads will continue to 
wait for ever. 

Unless i am missing something, isn't the above a possible scenario? 

I think if the shuffle thread does not get memory from the ram manager and the 
merge thread is waiting at that point we should do a merge to free up space 
irrespective of the criteria. But I also think that it is an unlikely case that 
the merge isn't already in progress when the memory request cannot be serviced. 

The patch I uploaded implements everything in ReduceTask.java (i forgot to 
mention that). I didn't have to touch RamManager.java and i still think it is 
_simpler_ and does exactly what we need (maybe a cleanup is required). If you 
remove the condition of forceMerge, it will become simpler (but maybe a cleanup 
is required).

> Shuffle/Merge improvements
> --------------------------
>
>                 Key: HADOOP-3366
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3366
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: mapred
>            Reporter: Arun C Murthy
>            Assignee: Arun C Murthy
>             Fix For: 0.18.0
>
>         Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch, 
> HADOOP-3366_0_20080605.patch, ifile.patch
>
>
> This is intended to be a meta-issue to track various improvements to 
> shuffle/merge in the reducer.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to