[jira] Commented: (MAPREDUCE-2157) tasklauncher threads in TaskTracker can die because of unexpected interrupts

Joydeep Sen Sarma (JIRA) Tue, 26 Oct 2010 14:33:51 -0700

    [ 
https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925151#action_12925151
 ]


Joydeep Sen Sarma commented on MAPREDUCE-2157:
----------------------------------------------

see: https://issues.apache.org/bugzilla/show_bug.cgi?id=44157

with the patch for this - log4j is setting interrupted state for threads. i 
think this bug and the comments suggest that there may be cases where 
InterruptedException is still being propagated from log4j (which is actually 
the more likely culprit for MAPREDUCE-2157). so it makes sense, as a 
precaution, to check for shutdown conditions inside interruptedexception 
handlers before exiting.

> tasklauncher threads in TaskTracker can die because of unexpected interrupts
> ----------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>
> taskLauncher thread exits on interruptedException and on Interrupt conditions 
> without checking for any shutdown flag:
>      while (!Thread.interrupted()) {
>         ...
>         } catch (InterruptedException e) { 
>           return; // ALL DONE                                                 
>                                                                      
>         }
>      }
> If the interrupt happened because of reasons other than TaskTracker.close() - 
> then the TaskTracker will look functional - but will not be able to schedule 
> tasks anymore. worse - some tasks (that are in the launch queue) will hang 
> indefinitely un UNASSIGNED state (the JobTracker will not even time them 
> out). We have seen this cause jobs to hang indefinitely.
> It seems that the interrupted condition can be set by log4j (of which there 
> are many calls inside TaskLauncher). See or instance: 
> http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (MAPREDUCE-2157) tasklauncher threads in TaskTracker can die because of unexpected interrupts

Reply via email to