[
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662011#comment-14662011
]
Jonathan Eagles commented on TEZ-2300:
--------------------------------------
MapReduce implementation works in a blocking manner, giving sufficient time for
a normal shutdown followed by a forcing kill as Rohini suggests. This
modification will give Pig/Hive/Others a more functional api.
{code:title=YARNRunner.java}
@Override
public void killJob(JobID arg0) throws IOException, InterruptedException {
/* check if the status is not running, if not send kill to RM */
JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();
// get status from RM and return
if (status == null) {
killUnFinishedApplication(appId);
return;
}
if (status.getState() != JobStatus.State.RUNNING) {
killApplication(appId);
return;
}
try {
/* send a kill to the AM */
clientCache.getClient(arg0).killJob(arg0);
long currentTimeMillis = System.currentTimeMillis();
long timeKillIssued = currentTimeMillis;
long killTimeOut =
conf.getLong(MRJobConfig.MR_AM_HARD_KILL_TIMEOUT_MS,
MRJobConfig.DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS);
while ((currentTimeMillis < timeKillIssued + killTimeOut)
&& !isJobInTerminalState(status)) {
try {
Thread.sleep(1000L);
} catch (InterruptedException ie) {
/** interrupted, just break */
break;
}
currentTimeMillis = System.currentTimeMillis();
status = clientCache.getClient(arg0).getJobStatus(arg0);
if (status == null) {
killUnFinishedApplication(appId);
return;
}
}
} catch(IOException io) {
LOG.debug("Error when checking for application status", io);
}
if (status != null && !isJobInTerminalState(status)) {
killApplication(appId);
}
}
{code}
> TezClient.stop() takes a lot of time or does not work sometimes
> ---------------------------------------------------------------
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Attachments: syslog_dag_1428329756093_325099_1_post
>
>
> Noticed this with a couple of pig scripts which were not behaving well (AM
> close to OOM, etc) and even with some that were running fine. Pig calls
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits
> immediately or is hung. In both cases it either takes a long time for the
> yarn application to go to KILLED state. Many times I just end up calling yarn
> application -kill separately after waiting for 5 mins or more for it to get
> killed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)