[ 
https://issues.apache.org/jira/browse/HADOOP-3517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12603947#action_12603947
 ] 

Arun C Murthy commented on HADOOP-3517:
---------------------------------------

bq. Wont the shuffle threads and the merge thread, both be waiting for the 
above case?

You are right and the same hit me last evening too! *smile*

So here is the proposal to get around this case:

In ShuffleRamManager.waitForDataToMerge we should check for 2 more conditions 
to break the deadlock:
1. The no. of shuffle threads blocked on the ShuffleRamManager is greater than 
75% of the total no. of shuffle threads. This ensures that we don't block the 
merge if there is no possiblity of more data being available.
2. To get around the case where we never get to the point where 75% of 
shuffle-threads are blocked, we need to un-stall the merge when the no. of 
shuffle-threads blocked is greater than or equal to the number of required 
map-outputs.

Thoughts?

> The last InMemory merge may be missed
> -------------------------------------
>
>                 Key: HADOOP-3517
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3517
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.18.0
>            Reporter: Devaraj Das
>            Assignee: Arun C Murthy
>            Priority: Blocker
>             Fix For: 0.18.0
>
>
> This is post HADOOP-3366. The inmem merge thread has the loop:
> {code}
>         while (!exitInMemMerge) {
>             ramManager.waitForDataToMerge();
>             doInMemMerge();
>           }
> {code}
> The fetchOutputs, at the end of copying everything, does the following:
> {code}
>         exitInMemMerge = true; 
>         ramManager.close();
> {code}
> Now if the merge thread is doing a merge (inside the doInMemMerge method) 
> when the exitInMemMerge is set to true, the loop will break and the last 
> merge of the files that got shuffled recently will be skipped. 
> ramManager.close(), that internally does a final notify to the merge thread 
> also won't have any effect in this case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to