[
https://issues.apache.org/jira/browse/HADOOP-3366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603113#action_12603113
]
Devaraj Das commented on HADOOP-3366:
-------------------------------------
Take this scenario:
1) Merge is currently in progress.
2) Shuffle-thread-1 has set stallShuffle=true but hasn't done
mergePassComplete.wait() and it got preempted
3) All other Shuffle-threads will now block at shuffle.wait (the
ramManager.notify won't have effect since the merge is already in progress)
4) Merge completes and invokes mergePassComplete.notifyAll. It then goes back
to ramManager.wait.
5) Since shuffle-thread-1 hasn't done mergePassComplete.wait, the notification
that the merge thread sent is lost and all the shuffle threads will continue to
wait for ever.
Unless i am missing something, isn't the above a possible scenario?
I think if the shuffle thread does not get memory from the ram manager and the
merge thread is waiting at that point we should do a merge to free up space
irrespective of the criteria. But I also think that it is an unlikely case that
the merge isn't already in progress when the memory request cannot be serviced.
The patch I uploaded implements everything in ReduceTask.java (i forgot to
mention that). I didn't have to touch RamManager.java and i still think it is
_simpler_ and does exactly what we need (maybe a cleanup is required). If you
remove the condition of forceMerge, it will become simpler (but maybe a cleanup
is required).
> Shuffle/Merge improvements
> --------------------------
>
> Key: HADOOP-3366
> URL: https://issues.apache.org/jira/browse/HADOOP-3366
> Project: Hadoop Core
> Issue Type: Improvement
> Components: mapred
> Reporter: Arun C Murthy
> Assignee: Arun C Murthy
> Fix For: 0.18.0
>
> Attachments: 3366.1.patch, 3366.1.patch, 3366.reducetask.patch,
> HADOOP-3366_0_20080605.patch, ifile.patch
>
>
> This is intended to be a meta-issue to track various improvements to
> shuffle/merge in the reducer.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.