[Design Doc] Hot Standby in Replicas to Reduce Failover Time

Jordan Ly Wed, 30 Aug 2017 19:19:49 -0700

Hi everyone,

Following up on the discussion here:
https://lists.apache.org/thread.html/e31d7dbcb054ed570f969ae2043eadfc090383edfe0751cec59b29d3@%3Cdev.aurora.apache.org%3E


I've created a design document detailing the implementation of a "hot
standby" mechanism where scheduler followers would eagerly read and
apply entries from the replicated log. The goal of this change is
that, in the event of a failover, the newly elected follower will not
have to replay as many entries to rebuild its state and thus can start
serving traffic faster.

https://docs.google.com/document/d/1DOtKA4-vrtxat1MaUYMQ6Y1iXhA8ob6Mfztzt-R1Oss/edit?usp=sharing

I have a working prototype of the above design running on a test
cluster. Please feel free to comment on the doc!

This document references a current proposal in Mesos by Ilya Pronin
here: 
https://lists.apache.org/thread.html/1b8fd10e151054a85c9ea3dc808f7fecb9a87fe5f5e87b10caa46e2a@%3Cdev.mesos.apache.org%3E

Cheers,

Jordan Ly

[Design Doc] Hot Standby in Replicas to Reduce Failover Time

Reply via email to