[
https://issues.apache.org/jira/browse/MESOS-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yong Qiao Wang updated MESOS-3403:
----------------------------------
Shepherd: (was: Vinod Kone)
> Add support for removing no re-registered slaves with
> timeout(--slave_reregister_timeout) from an external allocator
> --------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-3403
> URL: https://issues.apache.org/jira/browse/MESOS-3403
> Project: Mesos
> Issue Type: Improvement
> Components: master
> Reporter: Yong Qiao Wang
> Assignee: Yong Qiao Wang
>
> For an external Mesos allocator which does not run with Mesos master in the
> same OS process, and maybe this allocator can be deployed in the different
> host with Mesos master, then the Mesos allocator module should be implemented
> as a proxy, which delegates calls to an actual allocator.
> For this external allocator, the total resources and allocated resources will
> be stored in it. After Mesos master recovery (such as fail-over), it needs to
> sync up with Mesos master. Under normal circumstances, all slaves will
> reregister after Mesos master recovery, so we can sync up the total resources
> and used resource of each slave in allocator->addSlave function call. But for
> the abnormal case, a slave does not reregister after Mesos master recovery,
> then master will call function Master::removeSlave(const Registry::Slave&
> slave) to remove this slave from Registry after
> timeout(slave_reregister_timeout), but this function does not call allocator
> to remove the related resources. So in order to support the resources sync up
> with the external allocator in this abnormal case, it needs to enhance
> function Master::removeSlave(const Registry::Slave& slave) to call
> allocator->removeSlave to remove the related resources from external
> allocator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)