[
https://issues.apache.org/jira/browse/TEZ-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14498230#comment-14498230
]
Rohini Palaniswamy commented on TEZ-2300:
-----------------------------------------
There are couple of issues with the behavior after talking to [~jlowe] and
comparing what is done in MR
- Kill is put in the event queue and is processed like any other event.
When there are millions of event in the queue it takes a long time to get to
that and I see the AM even scheduling new tasks. MR also does it this way.
Problem is with too many events and TEZ-776 should reduce that. But still with
large jobs there are going to be many events in the queue.
- TezClient.stop() returns immediately after the kill. It should not and it
should poll and wait on the client side. MR does that.
- If the DAG is not killed and session not shutdown even after a certain
timeout, yarn kill should be called. MR does that.
This is an important issue as people might kill a script and think the
application is killed and proceed with running a new one which could cause lot
of issues while the old one is still running. So the kill needs to be
synchronous and reliable.
> TezClient.stop() takes a lot of time or does not work sometimes
> ---------------------------------------------------------------
>
> Key: TEZ-2300
> URL: https://issues.apache.org/jira/browse/TEZ-2300
> Project: Apache Tez
> Issue Type: Bug
> Reporter: Rohini Palaniswamy
> Attachments: syslog_dag_1428329756093_325099_1_post
>
>
> Noticed this with a couple of pig scripts which were not behaving well (AM
> close to OOM, etc) and even with some that were running fine. Pig calls
> Tezclient.stop() in shutdown hook. Ctrl+C to the pig script either exits
> immediately or is hung. In both cases it either takes a long time for the
> yarn application to go to KILLED state. Many times I just end up calling yarn
> application -kill separately after waiting for 5 mins or more for it to get
> killed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)