Dario Rexin created MESOS-2301:
----------------------------------

             Summary: Slave does not cleanly unregister
                 Key: MESOS-2301
                 URL: https://issues.apache.org/jira/browse/MESOS-2301
             Project: Mesos
          Issue Type: Bug
          Components: master, slave
            Reporter: Dario Rexin


If a machine running the mesos slave is being rebooted, the mesos slave does a 
clean shutdown. It stops alls its executors, unregisters from the master and 
removes the symlink to the latest state. 

However, if the master is not reachable during the reboot, it will still remove 
the symlink to the latest state and will register with a new ID when restarted. 
This leads to the master waiting for the slave to come back for the configured 
amount if time and not marking the tasks as lost or killed. This also means, 
that these tasks will not be restarted by the framework (in this case 
Marathon), because it assumes they are still alive.

This problem could be solved by introducing a new message 
`SlaveUnregisteredMessage` that gets send by the master when a slave 
successfully unregistered. The slav only has to wait for this message and if it 
doesn't receive it, it should not remove the symlink to `latest`. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to