[ https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joydeep Sen Sarma updated MAPREDUCE-2157: ----------------------------------------- Summary: safely handle InterruptedException and interrupted status in MR code (was: tasklauncher threads in TaskTracker can die because of unexpected interrupts) agreed - there are many cases where the handling of InterruptedException is unsafe in MR code. I will post a patch addressing all of these. Renaming to reflect altered scope. > safely handle InterruptedException and interrupted status in MR code > -------------------------------------------------------------------- > > Key: MAPREDUCE-2157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Joydeep Sen Sarma > Assignee: Joydeep Sen Sarma > Priority: Critical > > taskLauncher thread exits on interruptedException and on Interrupt conditions > without checking for any shutdown flag: > while (!Thread.interrupted()) { > ... > } catch (InterruptedException e) { > return; // ALL DONE > > } > } > If the interrupt happened because of reasons other than TaskTracker.close() - > then the TaskTracker will look functional - but will not be able to schedule > tasks anymore. worse - some tasks (that are in the launch queue) will hang > indefinitely un UNASSIGNED state (the JobTracker will not even time them > out). We have seen this cause jobs to hang indefinitely. > It seems that the interrupted condition can be set by log4j (of which there > are many calls inside TaskLauncher). See or instance: > http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.