This had been put on the back-burner in favor of shipping other features. Work is beginning on the Registrar again, I've created a design doc here: https://cwiki.apache.org/confluence/display/MESOS/Registrar+Design+Document
Expect to new reviews in this area as well! Please reach out with any questions / feedback. On Thu, Oct 31, 2013 at 5:02 PM, Benjamin Mahler <[email protected]>wrote: > Hi All, > > I'd like to mention some changes that have been discussed amongst the > committers but have not yet been shared broadly with the list. > > The central component of Mesos is the "Master". The Master is responsible > for administering slaves, frameworks, and resource offers. It also handles > task launching requests, status updates, and framework messages. As you may > or may not know, the Master is currently stateless, in that it does not > persist any information across failovers. Rather, the Master currently > recovers all of its state from the slaves and frameworks that re-register > after a failover. > > This design has many benefits. First, failing over a Master is a trivial > operation. Second, we do not have the performance overhead and complexity > of dealing with persistent state. However, this design opens up a few cases > for information loss in the system. For example, when no Master is running > and a Slave fails permanently, there's no knowledge of this in the failed > over Master. > > In order to detect these events, we'd like to add persistence of the > registered slaves. The first step for this was creating the Registrar: > https://reviews.apache.org/r/14383/ > https://reviews.apache.org/r/14384/ > https://reviews.apache.org/r/15099/ > https://reviews.apache.org/r/15100/ > > The Registrar is responsible for keeping the official records of the > master. This will initially include SlaveInfo in order to correctly handle > cases like the example I provided above. The Registrar is agnostic to the > underlying data storage and can be backed by a local LevelDB, by ZooKeeper > (for high availability Masters), and in the future by our reconfigurable > replicated log. > > The next steps are to implement "statefulness" in the Master using the > Registrar. So far I've sent out some of the preliminary cleanup work, and I > have a few pending patches that I'm in the process of cleaning up that > implement this fully so keep an eye out for those. > > In the longer term, we will add persistence of framework information in > the same vein. That is, handling framework failures in the presence of > Master failures. > > Cheers! > Ben >
