[jira] [Comment Edited] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

Bikas Saha (JIRA) Wed, 19 Aug 2015 18:48:07 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704141#comment-14704141
 ]


Bikas Saha edited comment on TEZ-2687 at 8/20/15 1:46 AM:
----------------------------------------------------------

Typo. And log message should probably now be "initiating stop"
{code}+  public synchronized void initiateStop() {
+    // release held containers
+    LOG.info("Realease held containers");
+    isStopStarted.set(true);{code}

This is probably going to cause concurrent access modification
{code}+    // remove taskRequest from AMRMClient to avoid allocating new 
containers in the next heartbeat
+    LOG.info("Remove all the taskRequests");
+    for (Object task : taskRequests.keySet()) {
+      removeTaskRequest(task);
+    }{code}

The test should allocate only 2 containers so that 1 task request is still 
pending when initiateStop is called. That way we can also verify that the 
pending task requests are removed at the AMRMClient.

The new config var is unused. Why add it now?


was (Author: bikassaha):
Typo. And log message should probably now be "initiating stop"
{code}+  public synchronized void initiateStop() {
+    // release held containers
+    LOG.info("Realease held containers");
+    isStopStarted.set(true);{code}

This is probably going to cause concurrent access modification
{code}+    // remove taskRequest from AMRMClient to avoid allocating new 
containers in the next heartbeat
+    LOG.info("Remove all the taskRequests");
+    for (Object task : taskRequests.keySet()) {
+      removeTaskRequest(task);
+    }{code}

The test should allocate only 2 containers so that 1 task request is still 
pending when initiateStop is called. That way we can also verify that the 
pending task requests are removed at the AMRMClient.

> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
>                 Key: TEZ-2687
>                 URL: https://issues.apache.org/jira/browse/TEZ-2687
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.2, 0.8.0, 0.7.1
>            Reporter: Gopal V
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each 
> Tez AM holds onto a few containers even though it is shutting down and will 
> never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

Reply via email to