Thanks for the design doc. Those graphs look awesome. We probably should get those into this doc: https://github.com/apache/mesos/blob/master/docs/replicated-log-internals.md
Regarding the doc, I'd like to see some correctness argument about why the catchup process won't demote the current leader. For instance, to which position do you stop the catchup and why? The correct argument does not have to be very formal. Something similar to that in the following should be sufficient. https://github.com/apache/mesos/blob/master/docs/replicated-log-internals.md#catch-up - Jie On Wed, Nov 2, 2016 at 9:35 PM, Igor Morozov <[email protected]> wrote: > Hi mesos developers, > > I have been working on a technical proposal to improve availability and > failover strategy for mesos replicated log library. The primary use case > we'd like to improve is aurora scheduler's leader failover and expensive > replicated log compaction that gets run at a new leader start. > > Joseph Wu from Mesosphere was shepherding the proposal: > > https://docs.google.com/document/d/1gSn7tOzCrG6w8vCSLtQ2ruIolZNsC > 4g2mi6jBsuDrHk/edit# > > > Please review it, any feedback is highly appreciated. > > -Igor Morozov >
