[ https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925212#action_12925212 ]
Joydeep Sen Sarma commented on MAPREDUCE-2157: ---------------------------------------------- another thing i figure out: we are actually running log4j 1.2.15 which does not have the fix for the bug referenced above. I looked through 1.2.15 log4j code and couldn't figure out who sets the interrupted status of the thread. Finally found out that JVM PrintStream class does it in the guts of the write() method (look at source in http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/cf44386c8fe3/src/share/classes/java/io/PrintStream.java). So everything's explained. log4j.info call went to printstream.write. which is going to ConsoleAppender being redirected to a file. For some reason the write system call was interrupted - and printstream set the thread interrupted status. tasklauncher hit wait() call and got interruptedexception and terminated. > tasklauncher threads in TaskTracker can die because of unexpected interrupts > ---------------------------------------------------------------------------- > > Key: MAPREDUCE-2157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Joydeep Sen Sarma > Assignee: Joydeep Sen Sarma > Priority: Critical > > taskLauncher thread exits on interruptedException and on Interrupt conditions > without checking for any shutdown flag: > while (!Thread.interrupted()) { > ... > } catch (InterruptedException e) { > return; // ALL DONE > > } > } > If the interrupt happened because of reasons other than TaskTracker.close() - > then the TaskTracker will look functional - but will not be able to schedule > tasks anymore. worse - some tasks (that are in the launch queue) will hang > indefinitely un UNASSIGNED state (the JobTracker will not even time them > out). We have seen this cause jobs to hang indefinitely. > It seems that the interrupted condition can be set by log4j (of which there > are many calls inside TaskLauncher). See or instance: > http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.