[ https://issues.apache.org/jira/browse/HADOOP-1270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12493440 ]
Devaraj Das commented on HADOOP-1270: ------------------------------------- +1 Also, this patch solves another issue that was reported as part of HADOOP-1183. That is, it solves the problem of old map events (corresponding to the failing fetches from locations which are currently not valid) overwriting new valid map events. This is because the datastructure, knownOutputs, has been made a List in this patch (earlier it was a Map) and so MapOutputLocations will get appended to the list rather than being overwritten. > Randomize the fetch of map outputs > ---------------------------------- > > Key: HADOOP-1270 > URL: https://issues.apache.org/jira/browse/HADOOP-1270 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Reporter: Arun C Murthy > Assigned To: Arun C Murthy > Fix For: 0.13.0 > > Attachments: HADOOP-1270_20070425_1.patch > > > HADOOP-248 did away with random probing of maps for locating map outputs and > instead we now rely on TaskCompletionEvents for the same. > However we lost out on the benefit that the randomization in probing resulted > in an added benefit where the map's jetty isn't overloaded with requests for > the outputs. We have now a situation where a map completes, the JT is > notified, *all* the reduces get the TaskCompletionEvent and pretty much swamp > the poor map's jetty and this repeats for each map. > I propose we make a minor change where we collect a set of > TaskCompletionEvents and randomize the list before firing the fetches. Should > help fix this mass-hysteria at the map's jetty. > Thoughts? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.