[
https://issues.apache.org/jira/browse/MESOS-3403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yong Qiao Wang updated MESOS-3403:
----------------------------------
Description:
For an external Mesos allocator which does not run with Mesos master in the
same OS process, and maybe this allocator can be deployed in the different host
with Mesos master, then the Mesos allocator module should be implemented as a
proxy, which delegates calls to an actual allocator.
For this external allocator, the total resources and allocated resources will
be stored in it. After Mesos master recovery (such as fail-over), it needs to
sync up with Mesos master. Under normal circumstances, all slaves will
reregister after Mesos master recovery, so we can sync up the total resources
and used resource of each slave in allocator->addSlave function call. But for
the abnormal case, a slave does not reregister after Mesos master recovery,
then master will call function Master::removeSlave(const Registry::Slave&
slave) to remove this slave from Registry after
timeout(--slave_reregister_timeout), but this function does not call allocator
to remove the related resources. So in order to support the resources sync up
with the external allocator in this abnormal case, it needs to enhance function
Master::removeSlave(const Registry::Slave& slave) to call
allocator->removeSlave to remove the related resources from external allocator.
was:
For an external Mesos allocator which does not run with Mesos master in the
same OS process, and maybe this allocator can be deployed in the different host
with Mesos master, then the Mesos allocator module should be implemented as a
proxy, which delegates calls to an actual allocator.
For this external allocator, the total resources and allocated resources will
be stored in it. After Mesos master recovery (such as fail-over), it needs to
sync up with Mesos master. Under normal circumstances, all slaves will
reregister after Mesos master recovery, so we can sync up the total resources
and used resource of each slave in allocator->addSlave function call. But for
the abnormal case, a slave does not reregister after Mesos master recovery,
then master will call function Master::removeSlave(const Registry::Slave&
slave) to remove this slave from Registry after timeout
(--slave_reregister_timeout), but this function does not call allocator to
remove the related resources. So in order to support the resources sync up with
the external allocator in this abnormal case, it needs to enhance function
Master::removeSlave(const Registry::Slave& slave) to call
allocator->removeSlave to remove the related resources from external allocator.
> Add support for removing no re-registered slaves with
> timeout(--slave_reregister_timeout) from an external allocator
> --------------------------------------------------------------------------------------------------------------------
>
> Key: MESOS-3403
> URL: https://issues.apache.org/jira/browse/MESOS-3403
> Project: Mesos
> Issue Type: Improvement
> Components: master
> Reporter: Yong Qiao Wang
> Assignee: Yong Qiao Wang
>
> For an external Mesos allocator which does not run with Mesos master in the
> same OS process, and maybe this allocator can be deployed in the different
> host with Mesos master, then the Mesos allocator module should be implemented
> as a proxy, which delegates calls to an actual allocator.
> For this external allocator, the total resources and allocated resources will
> be stored in it. After Mesos master recovery (such as fail-over), it needs to
> sync up with Mesos master. Under normal circumstances, all slaves will
> reregister after Mesos master recovery, so we can sync up the total resources
> and used resource of each slave in allocator->addSlave function call. But for
> the abnormal case, a slave does not reregister after Mesos master recovery,
> then master will call function Master::removeSlave(const Registry::Slave&
> slave) to remove this slave from Registry after
> timeout(--slave_reregister_timeout), but this function does not call
> allocator to remove the related resources. So in order to support the
> resources sync up with the external allocator in this abnormal case, it needs
> to enhance function Master::removeSlave(const Registry::Slave& slave) to call
> allocator->removeSlave to remove the related resources from external
> allocator.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)