[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

Bikas Saha (JIRA) Wed, 19 Aug 2015 11:41:47 -0700

    [ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703535#comment-14703535
 ]


Bikas Saha commented on TEZ-2687:
---------------------------------

Looks like the approach in the patch is to have the scheduler start stopping 
and then the actual stop is only used to effectively unregister from the RM. 
Right?

In that case, renaming releaseHeldContainers() -> initiateStop() would be more 
clear.

Why add more logs?
{code}   public void onContainersCompleted(List<ContainerStatus> statuses) {
-    if (isStopped.get()) {
+    if (isStopStarted.get()) {
+      for (ContainerStatus status : statuses) {
+        LOG.info("Container " + status.getContainerId() + " is completed");
+      }{code}

This should probably also collect all pending items in taskRequests and 
removeTaskRequest() for them. This will ensure that the RM allocation table for 
this AM is cleared and we will avoid getting more allocations after the next 
heartbeat.
{code}
+  @Override
+  public void releaseHeldContainers() {
+    LOG.info("Realease held containers");
+    synchronized (this) {
+      isStopStarted.set(true);
+      // Create a new list for containerIds to iterate, otherwise it would 
cause ConcurrentModificationException
+      // because method releaseContainer will change heldContainers.
+      List<ContainerId> heldContainerIds = new 
ArrayList<ContainerId>(heldContainers.size());
+      for (ContainerId containerId : heldContainers.keySet()) {
+        heldContainerIds.add(containerId);
+      }
+      for (ContainerId containerId : heldContainerIds) {
+        releaseContainer(containerId);
+      }
{code}

Tests?

> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
>                 Key: TEZ-2687
>                 URL: https://issues.apache.org/jira/browse/TEZ-2687
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.2, 0.8.0, 0.7.1
>            Reporter: Gopal V
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2687-1.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each 
> Tez AM holds onto a few containers even though it is shutting down and will 
> never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (TEZ-2687) ATS History shutdown happens before the min-held containers are released

Reply via email to