[
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703535#comment-14703535
]
Bikas Saha commented on TEZ-2687:
---------------------------------
Looks like the approach in the patch is to have the scheduler start stopping
and then the actual stop is only used to effectively unregister from the RM.
Right?
In that case, renaming releaseHeldContainers() -> initiateStop() would be more
clear.
Why add more logs?
{code} public void onContainersCompleted(List<ContainerStatus> statuses) {
- if (isStopped.get()) {
+ if (isStopStarted.get()) {
+ for (ContainerStatus status : statuses) {
+ LOG.info("Container " + status.getContainerId() + " is completed");
+ }{code}
This should probably also collect all pending items in taskRequests and
removeTaskRequest() for them. This will ensure that the RM allocation table for
this AM is cleared and we will avoid getting more allocations after the next
heartbeat.
{code}
+ @Override
+ public void releaseHeldContainers() {
+ LOG.info("Realease held containers");
+ synchronized (this) {
+ isStopStarted.set(true);
+ // Create a new list for containerIds to iterate, otherwise it would
cause ConcurrentModificationException
+ // because method releaseContainer will change heldContainers.
+ List<ContainerId> heldContainerIds = new
ArrayList<ContainerId>(heldContainers.size());
+ for (ContainerId containerId : heldContainers.keySet()) {
+ heldContainerIds.add(containerId);
+ }
+ for (ContainerId containerId : heldContainerIds) {
+ releaseContainer(containerId);
+ }
{code}
Tests?
> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
> Key: TEZ-2687
> URL: https://issues.apache.org/jira/browse/TEZ-2687
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.2, 0.8.0, 0.7.1
> Reporter: Gopal V
> Assignee: Jeff Zhang
> Attachments: TEZ-2687-1.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each
> Tez AM holds onto a few containers even though it is shutting down and will
> never accept any more DAGs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)