[ 
https://issues.apache.org/jira/browse/TEZ-2687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704129#comment-14704129
 ] 

Jeff Zhang edited comment on TEZ-2687 at 8/20/15 1:34 AM:
----------------------------------------------------------

* renaming releaseHeldContainers() -> initiateStop() would be more clear.
Fix it.
* Why add more logs?
To help verify containers are released when stop is initiated. 
* This should probably also collect all pending items in taskRequests and 
removeTaskRequest() for them. This will ensure that the RM allocation table for 
this AM is cleared and we will avoid getting more allocations after the next 
heartbeat.
Fix it

Also add new configuration TEZ_TEST_HISTORY_SERVICE_STOP_SLEEP_SECS to help 
system test to simulate the ATS hang behavior.  [~bikassaha] Please help 
review. 






was (Author: zjffdu):
* renaming releaseHeldContainers() -> initiateStop() would be more clear.
Fix it.
* Why add more logs?
To help verify containers are being released when stop is initiated. 
* This should probably also collect all pending items in taskRequests and 
removeTaskRequest() for them. This will ensure that the RM allocation table for 
this AM is cleared and we will avoid getting more allocations after the next 
heartbeat.
Fix it

Also add new configuration TEZ_TEST_HISTORY_SERVICE_STOP_SLEEP_SECS to help 
system test to simulate the ATS hang behavior.  [~bikassaha] Please help 
review. 





> ATS History shutdown happens before the min-held containers are released
> ------------------------------------------------------------------------
>
>                 Key: TEZ-2687
>                 URL: https://issues.apache.org/jira/browse/TEZ-2687
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.6.2, 0.8.0, 0.7.1
>            Reporter: Gopal V
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2687-1.patch, TEZ-2687-2.patch
>
>
> When ATS goes into a GC pause under heavy loads and while it recovers, each 
> Tez AM holds onto a few containers even though it is shutting down and will 
> never accept any more DAGs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to