[ https://issues.apache.org/jira/browse/MAPREDUCE-3851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13209877#comment-13209877 ]
Thomas Graves commented on MAPREDUCE-3851: ------------------------------------------ All the instances I've seen where it hits JETTY-1342 it is hosed pretty much immediately and it doesn't recover after seeing just a couple of the exceptions. That is why I had it just a straight count over the lifetime. I guess it would make it more extensible for possible future Jetty bugs to have it be a ratio though. So how about we do the ratio of say the last 100 requests? If that ratio goes above a certain threshold, then we abort. we could make that number (100) configurable if someone really wants. > Allow more aggressive action on detection of the jetty issue > ------------------------------------------------------------ > > Key: MAPREDUCE-3851 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-3851 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: tasktracker > Affects Versions: 1.0.0 > Reporter: Kihwal Lee > Assignee: Thomas Graves > Fix For: 1.1.0, 1.0.1 > > Attachments: MAPREDUCE-3851.patch > > > MAPREDUCE-2529 added the useful failure detection mechanism. In this jira, I > propose we add a periodic check inside TT and configurable action to > self-destruct. Blacklisting helps but is not enough. Hung jetty still accepts > connection and it takes very long time for clients to fail out. Short jobs > are delayed for hours because of this. This feature will be a nice companion > to MAPREDUCE-3184. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira