[ 
https://issues.apache.org/jira/browse/MESOS-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722951#comment-14722951
 ] 

Yong Qiao Wang commented on MESOS-3324:
---------------------------------------

My proposal to address this resource leak issue:
1. Add a timeout (for example, --framework_reregister_timeout) for framework 
reregister;
2. Add a new libprocess object to manage those orphaned tasks or executors, it 
will 
    - Clean up the orphaned tasks or executors after 
--framework_reregister_timeout when Mesos master restart;
    - Run to clean up  the orphaned tasks or executors (those orphaned object 
have lasted for a framework_reregister_timeout) when Mesos master running;

> Resource leak issue in Mesos
> ----------------------------
>
>                 Key: MESOS-3324
>                 URL: https://issues.apache.org/jira/browse/MESOS-3324
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Yong Qiao Wang
>            Assignee: Yong Qiao Wang
>            Priority: Critical
>
> In Mesos master recovery case, if one framework is exit during Mesos master 
> downtime and this framework has already launched some long running tasks 
> before Mesos master down. Then after Mesos master recovery, those long 
> running tasks will always running as the orphaned tasks in Mesos cluster, no 
> any other components can kill those tasks later. This should be a resource 
> leak issue in Mesos, I propose to add a timeout to kill those orphaned tasks 
> or executors in Mesos master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to