[ https://issues.apache.org/jira/browse/MESOS-4659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17038595#comment-17038595 ]
Vinod Kone commented on MESOS-4659: ----------------------------------- I dont have the bandwidth right now, but happy to review the code if you work on a patch. Please see instructions here: https://mesos.readthedocs.io/en/latest/submitting-a-patch/ > Avoid leaving orphan task after framework failure + master failover > ------------------------------------------------------------------- > > Key: MESOS-4659 > URL: https://issues.apache.org/jira/browse/MESOS-4659 > Project: Mesos > Issue Type: Bug > Components: master > Reporter: Neil Conway > Priority: Major > Labels: failover, mesosphere > > If a framework becomes disconnected from the master, its tasks are killed > after waiting for {{failover_timeout}}. > However, if a master failover occurs but a framework never reconnects to the > new master, we never kill any of the tasks associated with that framework. > These tasks remain orphaned and presumably would need to be manually removed > by the operator. Similarly, if a framework gets torn down or disconnects > while it has running tasks on a partitioned agent, those tasks are not > shutdown when the agent reregisters. > We should consider whether to kill such orphaned tasks automatically, likely > after waiting for some (framework-configurable?) timeout. -- This message was sent by Atlassian Jira (v8.3.4#803005)