[ https://issues.apache.org/jira/browse/MAPREDUCE-5502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13771216#comment-13771216 ]
Jason Lowe commented on MAPREDUCE-5502: --------------------------------------- Hmm, something must be wrong with the mapred client then as it explicitly checks with the RM to see if the application is running and if so, tries to connect to the AM to kill it. Looking deeper, it may be this code in YARNRunner.killJob: {code} /* check if the status is not running, if not send kill to RM */ JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0); if (status.getState() != JobStatus.State.RUNNING) { try { resMgrDelegate.killApplication(TypeConverter.toYarn(arg0).getAppId()); } catch (YarnException e) { throw new IOException(e); } return; } {code} So in this scenario the AM has finished the job but not unregistered yet. AM is telling clients that connect to it that the job status is SUCCEEDED/FAILED/KILLED (i.e.: not RUNNING but in some terminal state) but the AM has yet to unregister with the RM so the RM is directing clients to the AM when asked. If the RM kills the app I think there's not a lot of options for getting history consistently per the discussion above. We could fix this particular scenario by having YARNRunner not try to kill the application if the reported status is already a terminal state. There's the risk of an insane AM that thinks the job is completed and continues to report that but refuses to unregister from the RM. mapred job -kill would then be ineffective at killing such an application. Seems an unlikely scenario in practice, and there's always yarn -kill as a workaround if it did happen. MAPREDUCE-5497 probably made the race window for this scenario very small in practice, as it no longer waits 5 seconds after the job completes before unregistering. > History link in resource manager is broken for KILLED jobs > ---------------------------------------------------------- > > Key: MAPREDUCE-5502 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-5502 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: resourcemanager > Affects Versions: 2.0.5-alpha > Reporter: Vrushali C > Assignee: Vrushali C > Labels: ui > > History link in resource manager is broken for KILLED jobs. > Seems to happen with jobs with State 'KILLED' and FinalStatus 'KILLED'. If > the State is 'FINISHED' and FinalStatus is 'KILLED', then the "History" link > is fine. > It isn't easy to reproduce the problem since the time at which the app is > killed determines the state it ends up in, which is hard to guess. these > particular jobs seem to get a Diagnostics message of "Application killed by > user." where as the other killed jobs get " Kill Job received from client > job_1378766187901_0002 > Job received Kill while in RUNNING state. " -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira