[ https://issues.apache.org/jira/browse/HADOOP-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12522193 ]
Arun C Murthy commented on HADOOP-1158: --------------------------------------- Thanks for the review Enis. So, here is how we solve issues emanating from Jetty: if there are sufficient failures for a given map (say due to Jetty), we just fail the map and re-run it elsewhere, there-by the reducer isn't stuck. Now given sufficient no. of maps fail on the same TaskTracker (say Jetty again) then it gets blacklisted and hence no tasks are assigned to it... does that make sense? Please feel free to open further issues if you have other thoughts help improve things... > JobTracker should collect statistics of failed map output fetches, and take > decisions to reexecute map tasks and/or restart the (possibly faulty) Jetty > server on the TaskTracker > --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: HADOOP-1158 > URL: https://issues.apache.org/jira/browse/HADOOP-1158 > Project: Hadoop > Issue Type: Improvement > Components: mapred > Affects Versions: 0.12.2 > Reporter: Devaraj Das > Assignee: Arun C Murthy > Fix For: 0.15.0 > > Attachments: HADOOP-1158_20070702_1.patch, > HADOOP-1158_2_20070808.patch, HADOOP-1158_3_20070809.patch, > HADOOP-1158_4_20070817.patch, HADOOP-1158_5_20070823.patch > > > The JobTracker should keep a track (with feedback from Reducers) of how many > times a fetch for a particular map output failed. If this exceeds a certain > threshold, then that map should be declared as lost, and should be reexecuted > elsewhere. Based on the number of such complaints from Reducers, the > JobTracker can blacklist the TaskTracker. This will make the framework > reliable - it will take care of (faulty) TaskTrackers that sometimes always > fail to serve up map outputs (for which exceptions are not properly > raised/handled, for e.g., if the exception/problem happens in the Jetty > server). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.