[
https://issues.apache.org/jira/browse/MESOS-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14722951#comment-14722951
]
Yong Qiao Wang commented on MESOS-3324:
---------------------------------------
My proposal to address this resource leak issue:
1. Add a timeout (for example, --framework_reregister_timeout) for framework
reregister;
2. Add a new libprocess object to manage those orphaned tasks or executors, it
will
- Clean up the orphaned tasks or executors after
--framework_reregister_timeout when Mesos master restart;
- Run to clean up the orphaned tasks or executors (those orphaned object
have lasted for a framework_reregister_timeout) when Mesos master running;
> Resource leak issue in Mesos
> ----------------------------
>
> Key: MESOS-3324
> URL: https://issues.apache.org/jira/browse/MESOS-3324
> Project: Mesos
> Issue Type: Bug
> Reporter: Yong Qiao Wang
> Assignee: Yong Qiao Wang
> Priority: Critical
>
> In Mesos master recovery case, if one framework is exit during Mesos master
> downtime and this framework has already launched some long running tasks
> before Mesos master down. Then after Mesos master recovery, those long
> running tasks will always running as the orphaned tasks in Mesos cluster, no
> any other components can kill those tasks later. This should be a resource
> leak issue in Mesos, I propose to add a timeout to kill those orphaned tasks
> or executors in Mesos master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)