[ 
https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joydeep Sen Sarma updated MAPREDUCE-2157:
-----------------------------------------

    Status: Patch Available  (was: Open)

- get rid of 'interrupted()' check for termination (only one case in mr code 
base)
- don't rely on interruptedexception to determine thread termination condition. 
always check (and set) separate shutdown flag to check termination condition.
- mark some threads as daemon threads where they were not previously marked so. 
this is to alleviate any concerns around threads ignoring signals now and not 
shutting down.

> safely handle InterruptedException and interrupted status in MR code
> --------------------------------------------------------------------
>
>                 Key: MAPREDUCE-2157
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Joydeep Sen Sarma
>            Assignee: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: mapreduce-2157.1.patch
>
>
> taskLauncher thread exits on interruptedException and on Interrupt conditions 
> without checking for any shutdown flag:
>      while (!Thread.interrupted()) {
>         ...
>         } catch (InterruptedException e) { 
>           return; // ALL DONE                                                 
>                                                                      
>         }
>      }
> If the interrupt happened because of reasons other than TaskTracker.close() - 
> then the TaskTracker will look functional - but will not be able to schedule 
> tasks anymore. worse - some tasks (that are in the launch queue) will hang 
> indefinitely un UNASSIGNED state (the JobTracker will not even time them 
> out). We have seen this cause jobs to hang indefinitely.
> It seems that the interrupted condition can be set by log4j (of which there 
> are many calls inside TaskLauncher). See or instance: 
> http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to