[jira] [Commented] (MESOS-295) Allow new masters to have better understanding of cluster state

Bill Farner (JIRA) Fri, 08 Nov 2013 12:08:52 -0800

    [ 
https://issues.apache.org/jira/browse/MESOS-295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13817636#comment-13817636
 ]


Bill Farner commented on MESOS-295:
-----------------------------------

Is the following scenario handled cleanly?

1. Master goes down
2. Slave goes down, to never return
3. New master comes up

Will the framework be notified of lost tasks on the slave?

> Allow new masters to have better understanding of cluster state
> ---------------------------------------------------------------
>
>                 Key: MESOS-295
>                 URL: https://issues.apache.org/jira/browse/MESOS-295
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Joe Smith
>            Assignee: Benjamin Hindman
>            Priority: Critical
>              Labels: twitter
>             Fix For: 0.15.0
>
>
> If a new master becomes elected, it will only have knowledge of the current 
> state of the cluster. This can lead to a situation where tasks become lost 
> but aren't properly killed. For instance:
> 1) A set of machines (perhaps a datacenter rack) lose network connectivity 
> and their tasks are marked LOST by the master. However, they're still running.
> 2) Through a potentially unrelated situation, there is a master failover to a 
> new master
> 3) The network connection to the machines come back up
> 4) These slaves never killed their tasks (and they shouldn't if they can't 
> talk to a master)
> 5) Tasks stay running and aren't killed, taking up resources and running 
> outside the scope of the new master



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (MESOS-295) Allow new masters to have better understanding of cluster state

Reply via email to