[jira] [Created] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler

Siddharth Seth (JIRA) Mon, 15 Dec 2014 17:12:02 -0800

Siddharth Seth created MAPREDUCE-6197:
-----------------------------------------


             Summary: Cache MapOutputLocations in ShuffleHandler
                 Key: MAPREDUCE-6197
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6197
             Project: Hadoop Map/Reduce
          Issue Type: Bug
            Reporter: Siddharth Seth


ShuffleHandler currently seems to create a map of mapId - mapInfo (file.out / 
index information) when it receives a message.
This should be caching map info across requests, so that the a scan of all 
directories is not required for each reducer fetching from the same map.

Also, the scan for each map output / index file is performed twice per mapId 
within a request. In populateHeaders - once in the call to getMapOutputInfo, 
and then directly in the method.

For an invocation where we do end up with more than 1000 (default) mapIds in a 
single call, and don't cache them in the map - the path constructed for such 
entries will be invalid. This is highly unlikely to be the case though, until 
there's proper caching.
{code}
MapOutputInfo info = mapOutputInfoMap.get(mapId);
          if (info == null) {
            info = getMapOutputInfo(outputBasePathStr, mapId, reduceId, user);
          }
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (MAPREDUCE-6197) Cache MapOutputLocations in ShuffleHandler

Reply via email to