[
https://issues.apache.org/jira/browse/MESOS-7181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Anindya Sinha updated MESOS-7181:
---------------------------------
Summary: Stale frameworks seen on Mesos, but not known to scheduler (was:
Stale frameworks seen on Mesos, but not known to schedulers)
> Stale frameworks seen on Mesos, but not known to scheduler
> ----------------------------------------------------------
>
> Key: MESOS-7181
> URL: https://issues.apache.org/jira/browse/MESOS-7181
> Project: Mesos
> Issue Type: Bug
> Components: general
> Reporter: Anindya Sinha
> Assignee: Anindya Sinha
>
> Using a scheduler which launches multiple frameworks using scheduler driver,
> we observe occasionally that a framework exists on Mesos which is not known
> to the scheduler. Since there is no entity that acts on the offers, this
> framework ends up hogging all the offers leading to starvation in the cluster.
> This particular scenario is as follows:
> 1) Scheduler does a driver.start() which results in the 1st SUBSCRIBE sent to
> master.
> 2) The scheduler driver resends the SUBSCRIBE (since the framework has not
> yet registered) which is a result of the exponential backoff.
> 3) Framework is registered based on the 1st SUBSCRIBE, but the scheduler
> issues a driver.stop() immediately which results in a TEARDOWN sent to the
> master.
> 4) Master processes the TEARDOWN which removes the framework.
> 5) Master now processes the 2nd SUBSCRIBE (after authorization) and tries to
> add this framework. This succeeds and a new framework id is generated (since
> the original framework is no longer registered after the TEARDOWN) but the
> Scheduler driver by now has already terminated once the scheduler issued the
> driver.stop(). So, master continues to send offers to this 2nd framework and
> hogs on to offers till offer time out.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)