[jira] Updated: (HADOOP-968) Reduce shuffle and merge should be done a child JVM

Devaraj Das (JIRA) Sat, 14 Apr 2007 06:05:36 -0700

     [ 
https://issues.apache.org/jira/browse/HADOOP-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Devaraj Das updated HADOOP-968:
-------------------------------

    Attachment: 968.apr14.patch

Thanks for the review, Owen. Some comments below.

> 1. I notice that a lot of your iterators are not typed causing you to do 
> casts of itr.next 
> 2. In many cases, the loop "for(Item item: itemSet){..}" is easier to read 
> and more 
> concise. 
> 3. Maps should not be iterated through using: 
>      for(Map.Entry<Key,Value> item: myMap) {...} 

Done (old habits die hard *smile*).

> 4. It looks like each reduce from a job will cause its job's FetchState to be 
> added to 
> the list a multiple time, so it will fetch multiple times per a loop. 
No change. There is already a "break" statement in the loop as soon as one 
FetchState gets added. 

> 5. I'd remove the sleep from queryJobTracker and move it to the 
> MapEventsFetcherThread's run loop. 
Done

> 6. The doFetch is badly named, since it doesn't actually do the fetch. It 
> should be 
> called findReduces or something. 
Changed that to reducesInShuffle

> 7. The name of the parameter of the first parameter in 
> TaskUmbilicalProtocol.getMapCompletionEvents is "taskid", but if fact it is a 
> job id. 
Made the name change in TaskUmbilicalProtocol.java

> 8. The MapEventsFetcherThread's name doesn't need to include the task in the 
> normal case, but I guess for unit tests it might be useful. 
No change

> 9. I assume that the shuffle code in ReduceTask matches the old code in 
> ReduceTaskRunner. *smile* 
*Smile* yes the only change that has been introduced to take care of variable 
initializations (for example, the variable reduceTask's initialization is 
different).


> Reduce shuffle and merge should be done a child JVM
> ---------------------------------------------------
>
>                 Key: HADOOP-968
>                 URL: https://issues.apache.org/jira/browse/HADOOP-968
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.10.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.13.0
>
>         Attachments: 968.apr06.patch, 968.apr10.patch, 968.apr14.patch, 
> 968.patch
>
>
> The Reduce's shuffle and initial merge is done in the TaskTracker's JVM. It 
> would be better to have it run in the Task's child JVM. The advantages are:
>   1. The class path and environment would be set up correctly.
>   2. User code doesn't need to be loaded into the TaskTracker.
>   3. Lower memory usage and contention in the TaskTracker.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-968) Reduce shuffle and merge should be done a child JVM

Reply via email to