[
https://issues.apache.org/jira/browse/HADOOP-968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Devaraj Das updated HADOOP-968:
-------------------------------
Attachment: 968.apr14.patch
Thanks for the review, Owen. Some comments below.
> 1. I notice that a lot of your iterators are not typed causing you to do
> casts of itr.next
> 2. In many cases, the loop "for(Item item: itemSet){..}" is easier to read
> and more
> concise.
> 3. Maps should not be iterated through using:
> for(Map.Entry<Key,Value> item: myMap) {...}
Done (old habits die hard *smile*).
> 4. It looks like each reduce from a job will cause its job's FetchState to be
> added to
> the list a multiple time, so it will fetch multiple times per a loop.
No change. There is already a "break" statement in the loop as soon as one
FetchState gets added.
> 5. I'd remove the sleep from queryJobTracker and move it to the
> MapEventsFetcherThread's run loop.
Done
> 6. The doFetch is badly named, since it doesn't actually do the fetch. It
> should be
> called findReduces or something.
Changed that to reducesInShuffle
> 7. The name of the parameter of the first parameter in
> TaskUmbilicalProtocol.getMapCompletionEvents is "taskid", but if fact it is a
> job id.
Made the name change in TaskUmbilicalProtocol.java
> 8. The MapEventsFetcherThread's name doesn't need to include the task in the
> normal case, but I guess for unit tests it might be useful.
No change
> 9. I assume that the shuffle code in ReduceTask matches the old code in
> ReduceTaskRunner. *smile*
*Smile* yes the only change that has been introduced to take care of variable
initializations (for example, the variable reduceTask's initialization is
different).
> Reduce shuffle and merge should be done a child JVM
> ---------------------------------------------------
>
> Key: HADOOP-968
> URL: https://issues.apache.org/jira/browse/HADOOP-968
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.10.1
> Reporter: Owen O'Malley
> Assigned To: Devaraj Das
> Fix For: 0.13.0
>
> Attachments: 968.apr06.patch, 968.apr10.patch, 968.apr14.patch,
> 968.patch
>
>
> The Reduce's shuffle and initial merge is done in the TaskTracker's JVM. It
> would be better to have it run in the Task's child JVM. The advantages are:
> 1. The class path and environment would be set up correctly.
> 2. User code doesn't need to be loaded into the TaskTracker.
> 3. Lower memory usage and contention in the TaskTracker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.