[
https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342828#comment-15342828
]
Hudson commented on MAPREDUCE-6197:
-----------------------------------
SUCCESS: Integrated in Hadoop-trunk-Commit #9997 (See
[https://builds.apache.org/job/Hadoop-trunk-Commit/9997/])
MAPREDUCE-6197. Cache MapOutputLocations in ShuffleHandler. Contributed
(jianhe: rev d8107fcd1c93c202925f2946d0cd4072fe0aef1e)
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java
*
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java
> Cache MapOutputLocations in ShuffleHandler
> ------------------------------------------
>
> Key: MAPREDUCE-6197
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Siddharth Seth
> Assignee: Junping Du
> Fix For: 2.9.0
>
> Attachments: MAPREDUCE-6197.patch
>
>
> ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out /
> index information) when it receives a message.
> This should be caching map info across requests, so that the a scan of all
> directories is not required for each reducer fetching from the same map.
> Also, the scan for each map output / index file is performed twice per mapId
> within a request. In populateHeaders - once in the call to getMapOutputInfo,
> and then directly in the method.
> For an invocation where we do end up with more than 1000 (default) mapIds in
> a single call, and don't cache them in the map - the path constructed for
> such entries will be invalid. This is highly unlikely to be the case though,
> until there's proper caching.
> {code}
> MapOutputInfo info = mapOutputInfoMap.get(mapId);
> if (info == null) {
> info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]