> On Oct. 19, 2013, 11:22 p.m., Benjamin Hindman wrote:
> > src/log/coordinator.cpp, lines 540-543
> > <https://reviews.apache.org/r/14631/diff/3/?file=364789#file364789line540>
> >
> >     But might be slow, or worse, cause really weird non-deterministic 
> > performance (e.g., every once in a while we re-run fill because the local 
> > replica hadn't yet processed the learn giving weird performance blips. I 
> > think this part needs a little more thinking. What about sending a learn 
> > directly to the local replica in addition to broadcasting to the others? I 
> > could imagine another log::learn that also takes the replica pointer and 
> > only returns a future for once the local replica has received a 
> > LearnResponse but not the rest of the network. And if you don't want to 
> > send the LearnRequest twice to the local replica just filter that pid out 
> > when you broadcast to the network. And let's capture some TODOs some where 
> > about not writing the data twice (one for the WriteRequest and once for the 
> > WriteResponse) by using something like checksums.
> 
> Jie Yu wrote:
>     The logic of log::catchup() is as follows:
>     check -> if missing, fill -> check -> if missing, fill
>     
>     We will always check if a position is missing or not first. In the case 
> of a write, since we've already broadcasted a learned message before reaching 
> here, the learned message is guaranteed to be delivered before the check. 
> Essentially, the log::catchup() here is simply like a CHECK.
>     
>     What do you mean by "weird non-deterministic performance"? I guess the 
> only performance issue here comparing to the previous solution is that we 
> will write the local replica twice (write + learn). And I will try to solve 
> it by using a checksum later (which can also reduce the number of writes to 1 
> for remote replicas).

If log::catchup is just a CHECK, then how about we just make our intentions 
clear and check that Replica::missing returns false?


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27221
-----------------------------------------------------------


On Oct. 23, 2013, 10:34 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Oct. 23, 2013, 10:34 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up 
> mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not 
> able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several 
> Paxos phases (see src/log/consensus.hpp). Instead of using a blocking 
> semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry 
> logic by bumping the proposal number. This also requires us to slightly 
> change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, 
> introduced a catchup (see src/log/catchup.hpp) function to make sure the 
> underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am a2d8242 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/network.hpp d34cf78 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/messages/log.proto 3d5859f 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* 
> --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Reply via email to