Apologies for the long delay. I wouldn't call it experimental (that comment is stale), you should feel free to turn on strictness. Strictness enforces that agents that were removed by an old master cannot re-join with a new master. This preserves the steady state behavior: if the master removes an agent, it does not allow it to return. Ideally, the flag is removed and strictness is the default, but we didn't feel comfortable removing it until we had state backup support in the master. Turning off strictness allows for an escape hatch if state is lost. Now that we are persisting more information than just the list of agents, this escape hatch doesn't restore the other state (like maintenance schedules, quota information, etc).
As for why it's not on by default today, we found that many frameworks, like Aurora and Marathon, are capable of handling a removed agent re-surfacing in the cluster and so it wasn't critical to turn this on. Also, we also realized that we need to re-work the partition handling in Mesos in order to give frameworks the control over how to react to an unreachable agent. Does that clarify things? On Mon, Feb 1, 2016 at 11:04 AM, Zhitao Li <[email protected]> wrote: > Hi, > > I've been reading related documentation on Mesos website and trying to > understand the current status of registrar. > > I noticed that we still consider "--registrar_strict" as experimental, but > I can't find the back story of what's needed to finish the project or the > JIRA so tracking that. > > Also, does anyone have recommendations on whether we should turn this flag > on, and what benefits cluster operator would get? > > Thanks.
