Improving aurora availability and failover recovery time

Igor Morozov Mon, 29 Aug 2016 17:00:28 -0700

Folks,

I'm looking at improving availability for aurora followers meaning ideally
we'd like to achieve:
1. support active reads from aurora followers
2. improve recovery time for aurora master failover that scales now with
the number of log entries in replicated log since the latest snapshot.


We believe both goals could be addressed by enabling active reads from
stand-by aurora replicas and doing periodic snapshots similar to what
elected aurora master does.

Here is my recollection of what is going on, please correct it if it is
lacking details or is simply wrong. I also cc-ed the authors of mesos
replicated log:

So it seems the main reason aurora follower could not be used for reads is
because mesos replicated log does not guarantee that all log entries in the
acceptor/learner(essentially aurora replicated read replica) have been
"learned" in Paxos sense. The interface that mesos replicated log provides
allows range reads (from, to) and fails immediately if any replicated log
value does not have "learned" attribute set:
https://github.com/apache/mesos/blob/0d3793e94adcd6dc91d06404f20563
9cebd753fd/src/log/log.cpp#L421

Hence the design choice in aurora to use full reads from coordinator's
mesos replicated log that apparently are guaranteed to be learned:

https://github.com/apache/aurora/blob/b24619b28c4dbb35188871bacd0091
a9e01218e3/src/main/java/org/apache/aurora/scheduler/log/
mesos/MesosLog.java#L227

There is an open ticket in mesos for providing streaming support in mesos
replicated log:
https://issues.apache.org/jira/browse/MESOS-1944

It seems this feature when implemented can solve most if not all concerns
that are blocking aurora follower from being used for reads.

I don't understand this paragraph though:

"If an unlearned position is encountered, there are couple of options. One
option is to wait until it gets learned. However, it's likely that that
position never gets learned. We also don't wanna blindly do active read
(full paxos round) because it will possibly demote the leader. One solution
is to wait for the next learned event and then do active read for the
positions in-between."

Why would performing an active read for positions in-between while there is
new learned position available will not cause coordinator demotion?

It looks like in principle the same semantic could be supported for mesos
log's range reads without changing its interface (I understand it may lead
to unpredictable latencies).

Thoughts?
-Igor

Improving aurora availability and failover recovery time

Reply via email to