[ https://issues.apache.org/jira/browse/MAPREDUCE-2157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Joydeep Sen Sarma updated MAPREDUCE-2157: ----------------------------------------- Status: Patch Available (was: Open) - get rid of 'interrupted()' check for termination (only one case in mr code base) - don't rely on interruptedexception to determine thread termination condition. always check (and set) separate shutdown flag to check termination condition. - mark some threads as daemon threads where they were not previously marked so. this is to alleviate any concerns around threads ignoring signals now and not shutting down. > safely handle InterruptedException and interrupted status in MR code > -------------------------------------------------------------------- > > Key: MAPREDUCE-2157 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-2157 > Project: Hadoop Map/Reduce > Issue Type: Bug > Reporter: Joydeep Sen Sarma > Assignee: Joydeep Sen Sarma > Priority: Critical > Attachments: mapreduce-2157.1.patch > > > taskLauncher thread exits on interruptedException and on Interrupt conditions > without checking for any shutdown flag: > while (!Thread.interrupted()) { > ... > } catch (InterruptedException e) { > return; // ALL DONE > > } > } > If the interrupt happened because of reasons other than TaskTracker.close() - > then the TaskTracker will look functional - but will not be able to schedule > tasks anymore. worse - some tasks (that are in the launch queue) will hang > indefinitely un UNASSIGNED state (the JobTracker will not even time them > out). We have seen this cause jobs to hang indefinitely. > It seems that the interrupted condition can be set by log4j (of which there > are many calls inside TaskLauncher). See or instance: > http://logging.apache.org/log4j/1.2/xref/org/apache/log4j/AsyncAppender.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.