[ https://issues.apache.org/jira/browse/MAPREDUCE-6197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342828#comment-15342828 ]
Hudson commented on MAPREDUCE-6197: ----------------------------------- SUCCESS: Integrated in Hadoop-trunk-Commit #9997 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/9997/]) MAPREDUCE-6197. Cache MapOutputLocations in ShuffleHandler. Contributed (jianhe: rev d8107fcd1c93c202925f2946d0cd4072fe0aef1e) * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/main/java/org/apache/hadoop/mapred/ShuffleHandler.java * hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-shuffle/src/test/java/org/apache/hadoop/mapred/TestShuffleHandler.java > Cache MapOutputLocations in ShuffleHandler > ------------------------------------------ > > Key: MAPREDUCE-6197 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197 > Project: Hadoop Map/Reduce > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Siddharth Seth > Assignee: Junping Du > Fix For: 2.9.0 > > Attachments: MAPREDUCE-6197.patch > > > ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / > index information) when it receives a message. > This should be caching map info across requests, so that the a scan of all > directories is not required for each reducer fetching from the same map. > Also, the scan for each map output / index file is performed twice per mapId > within a request. In populateHeaders - once in the call to getMapOutputInfo, > and then directly in the method. > For an invocation where we do end up with more than 1000 (default) mapIds in > a single call, and don't cache them in the map - the path constructed for > such entries will be invalid. This is highly unlikely to be the case though, > until there's proper caching. > {code} > MapOutputInfo info = mapOutputInfoMap.get(mapId); > if (info == null) { > info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user); > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org