+1, this proposal looks sound to me. I'll leave any minor feedback on the doc but none of it will be blocking.
On Mon, Sep 4, 2017 at 10:31 AM, Erb, Stephan <stephan....@blue-yonder.com> wrote: > Thanks for the detailed design document and the in-depth walkthrough [1]! > Your proposal seems to be sound. (But be warned, I don’t have much > experience in this part of Aurora or Mesos :-)) > > [1] https://docs.google.com/presentation/d/1fQMfNLaRex9rJyq3h08HIujtpULoY > npFFV7-P6p6Zt0/edit#slide=id.p4 > > On 31.08.17, 04:18, "Jordan Ly" <jordan....@gmail.com> wrote: > > Hi everyone, > > Following up on the discussion here: > https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043ead > fc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E > > I've created a design document detailing the implementation of a "hot > standby" mechanism where scheduler followers would eagerly read and > apply entries from the replicated log. The goal of this change is > that, in the event of a failover, the newly elected follower will not > have to replay as many entries to rebuild its state and thus can start > serving traffic faster. > > https://docs.google.com/document/d/1DOtKA4- > vrtxat1MaUYMQ6Y1iXhA8ob6Mfztzt-R1Oss/edit?usp=sharing > > I have a working prototype of the above design running on a test > cluster. Please feel free to comment on the doc! > > This document references a current proposal in Mesos by Ilya Pronin > here: https://lists.apache.org/thread.html/ > 1b8fd10e151054a85c9ea3dc808f7fecb9a87fe5f5e87b10caa46e2a@% > 3Cdev.mesos.apache.org%3E > > Cheers, > > Jordan Ly > > >