[
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704129#comment-14704129
]
Jeff Zhang edited comment on TEZ-2687 at 8/20/15 1:34 AM:
----------------------------------------------------------
* renaming releaseHeldContainers() -> initiateStop() would be more clear.
Fix it.
* Why add more logs?
To help verify containers are released when stop is initiated.
* This should probably also collect all pending items in taskRequests and
removeTaskRequest() for them. This will ensure that the RM allocation table for
this AM is cleared and we will avoid getting more allocations after the next
heartbeat.
Fix it
Also add new configuration TEZ_TEST_HISTORY_SERVICE_STOP_SLEEP_SECS to help
system test to simulate the ATS hang behavior. [~bikassaha] Please help
review.
was (Author: zjffdu):
* renaming releaseHeldContainers() -> initiateStop() would be more clear.
Fix it.
* Why add more logs?
To help verify containers are being released when stop is initiated.
* This should probably also collect all pending items in taskRequests and
removeTaskRequest() for them. This will ensure that the RM allocation table for
this AM is cleared and we will avoid getting more allocations after the next
heartbeat.
Fix it
Also add new configuration TEZ_TEST_HISTORY_SERVICE_STOP_SLEEP_SECS to help
system test to simulate the ATS hang behavior. [~bikassaha] Please help
review.
> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
> Key: TEZ-2687
> URL: https://issues.apache.org/jira/browse/TEZ-2687
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.6.2, 0.8.0, 0.7.1
> Reporter: Gopal V
> Assignee: Jeff Zhang
> Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each
> Tez AM holds onto a few containers even though it is shutting down and will
> never accept any more DAGs.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)