[
https://issues.apache.org/jira/browse/HADOOP-968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12488787
]
Owen O'Malley commented on HADOOP-968:
--------------------------------------
1. I notice that a lot of your iterators are not typed causing you to do casts
of itr.next().
2. In many cases, the loop "for(Item item: itemSet){..}" is easier to read and
more concise.
3. Maps should not be iterated through using:
for(Map.Entry<Key,Value> item: myMap) {...}
rather than:
Iterator itr = myMap.keySet().iterator();
while (itr.hasNext()) {
Value value = myMap.get(itr.next());
...
}
4. It looks like each reduce from a job will cause its job's FetchState to be
added to the list a multiple time, so it will fetch multiple times per a loop.
5. I'd remove the sleep from queryJobTracker and move it to the
MapEventsFetcherThread's run loop.
6. The doFetch is badly named, since it doesn't actually do the fetch. It
should be called findReduces or something.
7. The name of the parameter of the first parameter in
TaskUmbilicalProtocol.getMapCompletionEvents is "taskid", but if fact it is a
job id.
8. The MapEventsFetcherThread's name doesn't need to include the task in the
normal case, but I guess for unit tests it might be useful.
9. I assume that the shuffle code in ReduceTask matches the old code in
ReduceTaskRunner. *smile*
> Reduce shuffle and merge should be done a child JVM
> ---------------------------------------------------
>
> Key: HADOOP-968
> URL: https://issues.apache.org/jira/browse/HADOOP-968
> Project: Hadoop
> Issue Type: Improvement
> Components: mapred
> Affects Versions: 0.10.1
> Reporter: Owen O'Malley
> Assigned To: Devaraj Das
> Fix For: 0.13.0
>
> Attachments: 968.apr06.patch, 968.apr10.patch, 968.patch
>
>
> The Reduce's shuffle and initial merge is done in the TaskTracker's JVM. It
> would be better to have it run in the Task's child JVM. The advantages are:
> 1. The class path and environment would be set up correctly.
> 2. User code doesn't need to be loaded into the TaskTracker.
> 3. Lower memory usage and contention in the TaskTracker.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.