[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925212#action_12925212
 ] 

Joydeep Sen Sarma commented on MAPREDUCE-2157:
----------------------------------------------

another thing i figure out: we are actually running log4j 1.2.15 which does not 
have the fix for the bug referenced above. I looked through 1.2.15 log4j code 
and couldn't figure out who sets the interrupted status of the thread. Finally 
found out that JVM PrintStream class does it in the guts of the write() method 
(look at source in 
http://hg.openjdk.java.net/jdk7/jdk7/jdk/file/cf44386c8fe3/src/share/classes/java/io/PrintStream.java).

So everything's explained. log4j.info call went to printstream.write. which is 
going to ConsoleAppender being redirected to a file. For some reason the write 
system call was interrupted - and printstream set the thread interrupted 
status. tasklauncher hit wait() call and got interruptedexception and 
terminated.

> tasklauncher threads in TaskTracker can die because of unexpected interrupts
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>
> taskLauncher thread exits on interruptedException and on Interrupt conditions 
> without checking for any shutdown flag:
>      while (!Thread.interrupted()) {
>         ...
>         } catch (InterruptedException e) { 
>           return; // ALL DONE                                                 
>                                                                      
>         }
>      }
> If the interrupt happened because of reasons other than TaskTracker.close() - 
> then the TaskTracker will look functional - but will not be able to schedule 
> tasks anymore. worse - some tasks (that are in the launch queue) will hang 
> indefinitely un UNASSIGNED state (the JobTracker will not even time them 
> out). We have seen this cause jobs to hang indefinitely.
> It seems that the interrupted condition can be set by log4j (of which there 
> are many calls inside TaskLauncher). See or instance: 
> http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to