[
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704141#comment-14704141
]
Bikas Saha edited comment on TEZ-2687 at 8/20/15 1:46 AM:
----------------------------------------------------------
Typo. And log message should probably now be "initiating stop"
{code}+ public synchronized void initiateStop() {
+ // release held containers
+ LOG.info("Realease held containers");
+ isStopStarted.set(true);{code}
This is probably going to cause concurrent access modification
{code}+ // remove taskRequest from AMRMClient to avoid allocating new
containers in the next heartbeat
+ LOG.info("Remove all the taskRequests");
+ for (Object task : taskRequests.keySet()) {
+ removeTaskRequest(task);
+ }{code}
The test should allocate only 2 containers so that 1 task request is still
pending when initiateStop is called. That way we can also verify that the
pending task requests are removed at the AMRMClient.
The new config var is unused. Why add it now?
was (Author: bikassaha):
Typo. And log message should probably now be "initiating stop"
{code}+ public synchronized void initiateStop() {
+ // release held containers
+ LOG.info("Realease held containers");
+ isStopStarted.set(true);{code}
This is probably going to cause concurrent access modification
{code}+ // remove taskRequest from AMRMClient to avoid allocating new
containers in the next heartbeat
+ LOG.info("Remove all the taskRequests");
+ for (Object task : taskRequests.keySet()) {
+ removeTaskRequest(task);
+ }{code}
The test should allocate only 2 containers so that 1 task request is still
pending when initiateStop is called. That way we can also verify that the
pending task requests are removed at the AMRMClient.
> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
> Key: TEZ-2687
> URL: https://issues.apache.org/jira/browse/TEZ-2687
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.2, 0.8.0, 0.7.1
> Reporter: Gopal V
> Assignee: Jeff Zhang
> Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each
> Tez AM holds onto a few containers even though it is shutting down and will
> never accept any more DAGs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)