Re: Review Request 14631: Catch-up Replicated Log 1: Making Coordinator Asynchronous.

Benjamin Hindman Sat, 19 Oct 2013 16:23:27 -0700

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14631/#review27221
-----------------------------------------------------------



Whew! Good stuff!

One high-level thought: you can change all the 'id' field names to 'proposal' 
in log.proto just don't change their numbers. ;)


src/Makefile.am
<https://reviews.apache.org/r/14631/#comment52936>

    Verify with Vinod Kone, but I think we're going to +2 indent for these 
lines. But thanks for doing the reformat!



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment52937>

    CATCHUPER!? s/CATCHUPER/CATCHUP/g ;)



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment52938>

    Technically, 'catch up' is two words (or hyphenated, i.e., 'catch-up') and 
'catchup' is a variant of 'ketchup'! ;) Not a huge deal for me if you don't 
want to all it 'recover', but let's at least use the correct 'catch up' in 
comments and other text (i.e., strings, documentation, etc). ;)
    
    Note, however, that I'd prefer we didn't do s/catchup/catchUp/.



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment52941>

    You need to say something about the lifetime of replica and network here. 
:( In particular, that you expect them to be alive until the future is 
satisfied (ready, failed, or discarded ... and even the discarded case is 
tricky because you could be using the pointers when someone discards the future 
themselves and therefore thinks the network and replica pointers can be 
deleted). :(



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment52939>

    Let's make this a TODO. We've got a RateLimiter in libprocess that would 
probably really easily help!



src/log/catchup.hpp
<https://reviews.apache.org/r/14631/#comment52942>

    Same comment about lifetime as above.



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52949>

    I think you can get away without the lambda::_1 now! And then you don't 
need to take the future below in filled which you don't use ...



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52945>

    Maybe a comment? I.e., "We discard these in 'finalize' so no more functions 
should get invoked via dispatch.".



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52946>

    else if?



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52948>

    Same as above for lambda::_1.



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52947>

    Same comment as above?



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52992>

    Future<Nothing> catchup(
        size_t quorum,
        ...
    
    Here and everywhere else please.



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52950>

    CatchupProcess* process = new CatchupProcess(
        quorum,
        replica,
        network,
        position,
        proposal);
    
    Or:
    
    CatchupProcess* process =
      new CatchupProcess(
          quorum,
          replica,
          network,
          position,
          proposal);
    
    Here and everywhere else please.



src/log/catchup.cpp
<https://reviews.apache.org/r/14631/#comment52954>

    My biggest concern with this code is the with the lifetime the Replica and 
Network pointers. If someone discards this future they can not also reliably 
delete their pointers immediately, they actually need to wait first for the 
processes to terminate (which they don't have references to!). I think the only 
safe way to handle this is to put a BLOCKING discarded callback on this future 
that actually waits for the process PID returned from spawn to terminate:
    
    ------------------------------------
    PID<CatchupProcess> pid = spawn(process, true);
    
    future.onDiscarded([pid] () {
      process::terminate(pid);
      process::wait(pid);
    });
    
    return future;
    ------------------------------------
    
    You can either decide to not do the terminate here since the process is 
doing it in 'initialize' or you can kill it from 'initialize'.
    
    This makes the discard operation be blocking, which is very very 
unfortunate, but at least it's safe. It's likely we should also consider how to 
change the semantics of Future::discard to capture these requirements without 
making the discard callback block. One short term solution might be to pass a 
boolean to Future::discard which signifies whether or not to wait for all 
callbacks to be invoked, so for people that "know" they don't need to wait they 
don't have to block.



src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment52952>

    Please call out the fields by putting them in quotes, i.e, the 'okay' field 
and the 'id' field. Here and everywhere else please!



src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment52953>

    Lifetime comment please, here and everywhere else.



src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment52993>

    What about:
    
    extern process::Future<PromiseResponse> promise(
        size_t quorum,
        Network* network,
        uint64_t proposal,
        const Option<uint64_t>& position = None());
    
    The comments will need to be adjusted, but the semantics should be okay. At 
the very least, it seems like 'position' should be the last argument even if 
you make it two functions.



src/log/consensus.hpp
<https://reviews.apache.org/r/14631/#comment52955>

    Why a hint only for fill?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52964>

    Maybe a comment explaining that we might get a quorum before we finish 
processing all responses so we need to discard the rest.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52957>

    Include failure or 'future discarded' message.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52961>

    Let's increment the responses received right away:
    
    responsesReceived += 1;
    
    if (!response.okay()) {
      ...



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52958>

    s/id/'id'/g (as commented on earlier).



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52965>

    How about making this an option? Then we don't have to do the > 0 check 
below.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52966>

    What about an 'if else' which bails if we've already received at least one 
NACK? This makes these semantics more explicit IMHO.
    
     ...
    } else if (highestNackProposal.isSome()) {
      // We still want to wait for more potential NACKs
      // so we can return the highest proposal number seen
      // but we don't care about any more ACKs.
      return;
    } else {
      ...



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52968>

    How about saying, "The position has been promised to us so the 'id' field 
should match the proposal we sent in the request."



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52959>

    Why not just:
    
    promise.set(response);



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52960>

    Use CopyFrom when you want to guarantee copy semantics. 



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52963>

    Maybe a comment reminding the reader that the remaining responses get 
discarded in 'finalize'. ;)



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52971>

    I know this is my old code, but I think in the interest of readability we 
should clean this up and just be redundant and explicit. First, how about 
making highestAckAction an Option. Then:
    
    } else if (response.action().has_performed()) {
      // An action has already been performed in this position, we
      // need to save the action with the highest proposal.
      if (highestAckAction.isNone() ||
          (response.action().performed() >
           highestAckAction.get().performed())) {
        highestAckAction = response.action();
      }
    }



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52967>

    How about saying:
    
    Received a response for a position that had previously been promised to 
some other proposer but an action had not been performed or learned. No need to 
do anything here, the position is not promised to us.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52972>

    Let's not increment here. I think it's nicer to see the first thing we do 
when we get a response is increment the number of responses (above).



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52969>

    if (highestNackProposal.isSome()) {
    
    } else {



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52973>

    if (highestActAction.isSome())



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52970>

    CopyFrom, ignore my old bad code. ;)



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52976>

    We tend to not use shortened variable names in the codebase. I think it's 
okay for Ack and Nack, but maybe s/numResponsesReceived/responsesReceived/?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52974>

    Ditto above.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52978>

    Option here too?
    
    highestNackProposal =
      std::max(highestNackProposal.get(response.id()), response.id());



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52979>

    Option?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52975>

    How about incrementing the responses above again? I like those semantics 
better.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52980>

    Could CHECK_SOME(highestEndPosition).



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52977>

    Same comment re: naming as above.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52981>

    The MergeFrom's should all likely be CopyFrom, my bad. :/



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52982>

    Do we really need a process for this?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52983>

    Great! ;) Just s/promising/'promising'/.



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52984>

    else if?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52985>

    Ha, 'catch-up', nice! ;)



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52986>

    s/catchup/log::catchup/



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52987>

    How about passing 'action' through the write and learn phases rather than 
making it an instance member?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52988>

    Do we really need/care to check the learned phase?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52989>

    s/bump/retry/ and s/p/highestNackProposal?



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52990>

    s/win/wins/



src/log/consensus.cpp
<https://reviews.apache.org/r/14631/#comment52991>

    s/(double)::random()/(double) ::random()/



src/log/coordinator.hpp
<https://reviews.apache.org/r/14631/#comment52994>

    Revert formatting fix please.



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment52995>

    virtual



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment52998>

    s/};/} state;/



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment52999>

    Kill (see above).



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment53002>

    Instead, let's have elect return an Option and if it's none then we'll 
assume that means we can retry. Let's keep failures exceptional. No control 
flow on failures! ;)



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment53001>

    Let's move these below the other ones which need to get called first. In 
general, we should organize the methods in the non-failure order so people can 
just read through the functions as though there were no asynchronous 
continuations. That will also let us easily port to C++11. ;)



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment53005>

    But might be slow, or worse, cause really weird non-deterministic 
performance (e.g., every once in a while we re-run fill because the local 
replica hadn't yet processed the learn giving weird performance blips. I think 
this part needs a little more thinking. What about sending a learn directly to 
the local replica in addition to broadcasting to the others? I could imagine 
another log::learn that also takes the replica pointer and only returns a 
future for once the local replica has received a LearnResponse but not the rest 
of the network. And if you don't want to send the LearnRequest twice to the 
local replica just filter that pid out when you broadcast to the network. And 
let's capture some TODOs some where about not writing the data twice (one for 
the WriteRequest and once for the WriteResponse) by using something like 
checksums.



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment53003>

    Eventually we'll want to propagate the future all the way out (even past 
the Log class). But not in this review, please no. ;)



src/log/coordinator.cpp
<https://reviews.apache.org/r/14631/#comment53004>

    I understand that returning an Option from CoordinatorProcess::elect is 
going to complicate the 'then' chained logic a bit, but give it a shot and 
let's check it out. I really do not like special casing the failure message!



src/log/replica.cpp
<https://reviews.apache.org/r/14631/#comment52996>

    How about:
    
    Highest proposal used by this coordinator.



src/log/replica.cpp
<https://reviews.apache.org/r/14631/#comment52997>

    s/performence/performance/



src/tests/log_tests.cpp
<https://reviews.apache.org/r/14631/#comment53006>

    How about a comment about why you do this _after_ the replicas start up?


- Benjamin Hindman


On Oct. 15, 2013, 7:49 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14631/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2013, 7:49 p.m.)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the first patch of a series of patches that implement a catch-up 
> mechanism for replicated log. See the following ticket for more details:
> https://issues.apache.org/jira/browse/MESOS-736
> 
> Here is a brief summary of this patch: (Sorry for the fact that we are not 
> able to break it into smaller patches :()
> 
> 1) Pulled the original Coordinator logic out and divides it into several 
> Paxos phases (see src/log/consensus.hpp). Instead of using a blocking 
> semantics, we implemented all the logics asynchronously.
> 
> 2) In order to ensure the liveness of a catch-uper, we implemented a retry 
> logic by bumping the proposal number. This also requires us to slightly 
> change the existing replica protocol.
> 
> 3) Made the "fill" operation independent of the underlying replica. Instead, 
> introduced a catchup (see src/log/catchup.hpp) function to make sure the 
> underlying local replica has learned each write.
> 
> 4) Modified the log tests to adapt to the new semantics (see (3) above)
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am a2d8242 
>   src/log/catchup.hpp PRE-CREATION 
>   src/log/catchup.cpp PRE-CREATION 
>   src/log/consensus.hpp PRE-CREATION 
>   src/log/consensus.cpp PRE-CREATION 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 59a6ff3 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/14631/diff/
> 
> 
> Testing
> -------
> 
> bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* 
> --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 14631: Catch-up Replicated Log 1: Making Coordinator Asynchronous.

Reply via email to