Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Jie Yu Wed, 23 Oct 2013 14:08:29 -0700


> On Oct. 19, 2013, 11:22 p.m., Benjamin Hindman wrote:
> > src/log/coordinator.cpp, lines 540-543
> > <https://reviews.apache.org/r/14631/diff/3/?file=364789#file364789line540>
> >
> >     But might be slow, or worse, cause really weird non-deterministic 
> > performance (e.g., every once in a while we re-run fill because the local 
> > replica hadn't yet processed the learn giving weird performance blips. I 
> > think this part needs a little more thinking. What about sending a learn 
> > directly to the local replica in addition to broadcasting to the others? I 
> > could imagine another log::learn that also takes the replica pointer and 
> > only returns a future for once the local replica has received a 
> > LearnResponse but not the rest of the network. And if you don't want to 
> > send the LearnRequest twice to the local replica just filter that pid out 
> > when you broadcast to the network. And let's capture some TODOs some where 
> > about not writing the data twice (one for the WriteRequest and once for the 
> > WriteResponse) by using something like checksums.


The logic of log::catchup() is as follows:
check -> if missing, fill -> check -> if missing, fill

We will always check if a position is missing or not first. In the case of a 
write, since we've already broadcasted a learned message before reaching here, 
the learned message is guaranteed to be delivered before the check. 
Essentially, the log::catchup() here is simply like a CHECK.

What do you mean by "weird non-deterministic performance"? I guess the only 
performance issue here comparing to the previous solution is that we will write 
the local replica twice (write + learn). And I will try to solve it by using a 
checksum later (which can also reduce the number of writes to 1 for remote 
replicas).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27221
-----------------------------------------------------------


On Oct. 23, 2013, 8:04 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Oct. 23, 2013, 8:04 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up 
> mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not 
> able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several 
> Paxos phases (see src/log/consensus.hpp). Instead of using a blocking 
> semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry 
> logic by bumping the proposal number. This also requires us to slightly 
> change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, 
> introduced a catchup (see src/log/catchup.hpp) function to make sure the 
> underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am a2d8242 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* 
> --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 14631: Catch-up Replicated Log 1: decoupled coordinator logics and made them asynchronous.

Reply via email to