[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704141#comment-14704141
 ] 

Bikas Saha commented on TEZ-2687:
---------------------------------

Typo. And log message should probably now be "initiating stop"
{code}+  public synchronized void initiateStop() {
+    // release held containers
+    LOG.info("Realease held containers");
+    isStopStarted.set(true);{code}

This is probably going to cause concurrent access modification
{code}+    // remove taskRequest from AMRMClient to avoid allocating new 
containers in the next heartbeat
+    LOG.info("Remove all the taskRequests");
+    for (Object task : taskRequests.keySet()) {
+      removeTaskRequest(task);
+    }{code}

The test should allocate only 2 containers so that 1 task request is still 
pending when initiateStop is called. That way we can also verify that the 
pending task requests are removed at the AMRMClient.

> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
>                 Key: TEZ-2687
>                 URL: https://issues.apache.org/jira/browse/TEZ-2687
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.2, 0.8.0, 0.7.1
>            Reporter: Gopal V
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each 
> Tez AM holds onto a few containers even though it is shutting down and will 
> never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to