Master::failoverFramework should remove existing framework offers last
----------------------------------------------------------------------
Key: MESOS-109
URL: https://issues.apache.org/jira/browse/MESOS-109
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Hindman
Priority: Critical
It looks like there is a bug in failing over the framework. As the master goes
to remove existing offers for the framwork it invokes the allocator's
"resourcesRecovered" callback. The current implementation of that callback is
to make new offers for any of those recovered resources to existing frameworks.
However, in this case, the only existing framework is currently being failed
over and has a bogus PID. Thus, when the allocator calls back into the master
to send an offer for the framework it uses said bogus PID, and those offers get
sent into oblivion.
The short term fix is to remove the existing offers after all of the failover
logic has been performed (see Master::failoverFramework). The long term fix is
to actually get the allocator running independently of the master (as it's own
libprocess process) so that we don't have to think about complicated control
flow interactions between the two.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira