[ https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925151#action_12925151 ]
Joydeep Sen Sarma commented on MAPREDUCE-2157: ---------------------------------------------- see: https://issues.apache.org/bugzilla/show_bug.cgi?id=44157 with the patch for this - log4j is setting interrupted state for threads. i think this bug and the comments suggest that there may be cases where InterruptedException is still being propagated from log4j (which is actually the more likely culprit for MAPREDUCE-2157). so it makes sense, as a precaution, to check for shutdown conditions inside interruptedexception handlers before exiting. > tasklauncher threads in TaskTracker can die because of unexpected interrupts > ---------------------------------------------------------------------------- > > Key: MAPREDUCE-2157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Joydeep Sen Sarma > Assignee: Joydeep Sen Sarma > Priority: Critical > > taskLauncher thread exits on interruptedException and on Interrupt conditions > without checking for any shutdown flag: > while (!Thread.interrupted()) { > ... > } catch (InterruptedException e) { > return; // ALL DONE > > } > } > If the interrupt happened because of reasons other than TaskTracker.close() - > then the TaskTracker will look functional - but will not be able to schedule > tasks anymore. worse - some tasks (that are in the launch queue) will hang > indefinitely un UNASSIGNED state (the JobTracker will not even time them > out). We have seen this cause jobs to hang indefinitely. > It seems that the interrupted condition can be set by log4j (of which there > are many calls inside TaskLauncher). See or instance: > http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.