Ivan Mitic created HADOOP-9970: ---------------------------------- Summary: TaskTracker hung after failed reconnect to the JobTracker Key: HADOOP-9970 URL: https://issues.apache.org/jira/browse/HADOOP-9970 Project: Hadoop Common Issue Type: Bug Affects Versions: 1.3.0 Reporter: Ivan Mitic Assignee: Ivan Mitic
TaskTracker hung after failed reconnect to the JobTracker. This is the problematic piece of code: {code} this.distributedCacheManager = new TrackerDistributedCacheManager( this.fConf, taskController); this.distributedCacheManager.startCleanupThread(); this.jobClient = (InterTrackerProtocol) UserGroupInformation.getLoginUser().doAs( new PrivilegedExceptionAction<Object>() { public Object run() throws IOException { return RPC.waitForProxy(InterTrackerProtocol.class, InterTrackerProtocol.versionID, jobTrackAddr, fConf); } }); {code} In case RPC.waitForProxy() throws, TrackerDistributedCacheManager cleanup thread will never be stopped, and given that it is a non daemon thread it will keep TT up forever. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira