[ 
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662011#comment-14662011
 ] 

Jonathan Eagles commented on TEZ-2300:
--------------------------------------

MapReduce implementation works in a blocking manner, giving sufficient time for 
a normal shutdown followed by a forcing kill as Rohini suggests. This 
modification will give Pig/Hive/Others a more functional api.
{code:title=YARNRunner.java}
  @Override
  public void killJob(JobID arg0) throws IOException, InterruptedException {
    /* check if the status is not running, if not send kill to RM */
    JobStatus status = clientCache.getClient(arg0).getJobStatus(arg0);
    ApplicationId appId = TypeConverter.toYarn(arg0).getAppId();

    // get status from RM and return
    if (status == null) {
      killUnFinishedApplication(appId);
      return;
    }

    if (status.getState() != JobStatus.State.RUNNING) {
      killApplication(appId);
      return;
    }

    try {
      /* send a kill to the AM */
      clientCache.getClient(arg0).killJob(arg0);
      long currentTimeMillis = System.currentTimeMillis();
      long timeKillIssued = currentTimeMillis;
      long killTimeOut =
          conf.getLong(MRJobConfig.MR_AM_HARD_KILL_TIMEOUT_MS,
                       MRJobConfig.DEFAULT_MR_AM_HARD_KILL_TIMEOUT_MS);
      while ((currentTimeMillis < timeKillIssued + killTimeOut)
          && !isJobInTerminalState(status)) {
        try {
          Thread.sleep(1000L);
        } catch (InterruptedException ie) {
          /** interrupted, just break */
          break;
        }
        currentTimeMillis = System.currentTimeMillis();
        status = clientCache.getClient(arg0).getJobStatus(arg0);
        if (status == null) {
          killUnFinishedApplication(appId);
          return;
        }
      }
    } catch(IOException io) {
      LOG.debug("Error when checking for application status", io);
    }
    if (status != null && !isJobInTerminalState(status)) {
      killApplication(appId);
    }
  }
{code}

> TezClient.stop() takes a lot of time or does not work sometimes
> ---------------------------------------------------------------
>
>                 Key: TEZ-2300
>                 URL: https://issues.apache.org/jira/browse/TEZ-2300
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>         Attachments: syslog_dag_1428329756093_325099_1_post 
>
>
>   Noticed this with a couple of pig scripts which were not behaving well (AM 
> close to OOM, etc) and even with some that were running fine. Pig calls 
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits 
> immediately or is hung. In both cases it either takes a long time for the 
> yarn application to go to KILLED state. Many times I just end up calling yarn 
> application -kill separately after waiting for 5 mins or more for it to get 
> killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to