----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/14631/#review27221 -----------------------------------------------------------
Whew! Good stuff! One high-level thought: you can change all the 'id' field names to 'proposal' in log.proto just don't change their numbers. ;) src/Makefile.am <https://reviews.apache.org/r/14631/#comment52936> Verify with Vinod Kone, but I think we're going to +2 indent for these lines. But thanks for doing the reformat! src/log/catchup.hpp <https://reviews.apache.org/r/14631/#comment52937> CATCHUPER!? s/CATCHUPER/CATCHUP/g ;) src/log/catchup.hpp <https://reviews.apache.org/r/14631/#comment52938> Technically, 'catch up' is two words (or hyphenated, i.e., 'catch-up') and 'catchup' is a variant of 'ketchup'! ;) Not a huge deal for me if you don't want to all it 'recover', but let's at least use the correct 'catch up' in comments and other text (i.e., strings, documentation, etc). ;) Note, however, that I'd prefer we didn't do s/catchup/catchUp/. src/log/catchup.hpp <https://reviews.apache.org/r/14631/#comment52941> You need to say something about the lifetime of replica and network here. :( In particular, that you expect them to be alive until the future is satisfied (ready, failed, or discarded ... and even the discarded case is tricky because you could be using the pointers when someone discards the future themselves and therefore thinks the network and replica pointers can be deleted). :( src/log/catchup.hpp <https://reviews.apache.org/r/14631/#comment52939> Let's make this a TODO. We've got a RateLimiter in libprocess that would probably really easily help! src/log/catchup.hpp <https://reviews.apache.org/r/14631/#comment52942> Same comment about lifetime as above. src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52949> I think you can get away without the lambda::_1 now! And then you don't need to take the future below in filled which you don't use ... src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52945> Maybe a comment? I.e., "We discard these in 'finalize' so no more functions should get invoked via dispatch.". src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52946> else if? src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52948> Same as above for lambda::_1. src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52947> Same comment as above? src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52992> Future<Nothing> catchup( size_t quorum, ... Here and everywhere else please. src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52950> CatchupProcess* process = new CatchupProcess( quorum, replica, network, position, proposal); Or: CatchupProcess* process = new CatchupProcess( quorum, replica, network, position, proposal); Here and everywhere else please. src/log/catchup.cpp <https://reviews.apache.org/r/14631/#comment52954> My biggest concern with this code is the with the lifetime the Replica and Network pointers. If someone discards this future they can not also reliably delete their pointers immediately, they actually need to wait first for the processes to terminate (which they don't have references to!). I think the only safe way to handle this is to put a BLOCKING discarded callback on this future that actually waits for the process PID returned from spawn to terminate: ------------------------------------ PID<CatchupProcess> pid = spawn(process, true); future.onDiscarded([pid] () { process::terminate(pid); process::wait(pid); }); return future; ------------------------------------ You can either decide to not do the terminate here since the process is doing it in 'initialize' or you can kill it from 'initialize'. This makes the discard operation be blocking, which is very very unfortunate, but at least it's safe. It's likely we should also consider how to change the semantics of Future::discard to capture these requirements without making the discard callback block. One short term solution might be to pass a boolean to Future::discard which signifies whether or not to wait for all callbacks to be invoked, so for people that "know" they don't need to wait they don't have to block. src/log/consensus.hpp <https://reviews.apache.org/r/14631/#comment52952> Please call out the fields by putting them in quotes, i.e, the 'okay' field and the 'id' field. Here and everywhere else please! src/log/consensus.hpp <https://reviews.apache.org/r/14631/#comment52953> Lifetime comment please, here and everywhere else. src/log/consensus.hpp <https://reviews.apache.org/r/14631/#comment52993> What about: extern process::Future<PromiseResponse> promise( size_t quorum, Network* network, uint64_t proposal, const Option<uint64_t>& position = None()); The comments will need to be adjusted, but the semantics should be okay. At the very least, it seems like 'position' should be the last argument even if you make it two functions. src/log/consensus.hpp <https://reviews.apache.org/r/14631/#comment52955> Why a hint only for fill? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52964> Maybe a comment explaining that we might get a quorum before we finish processing all responses so we need to discard the rest. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52957> Include failure or 'future discarded' message. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52961> Let's increment the responses received right away: responsesReceived += 1; if (!response.okay()) { ... src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52958> s/id/'id'/g (as commented on earlier). src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52965> How about making this an option? Then we don't have to do the > 0 check below. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52966> What about an 'if else' which bails if we've already received at least one NACK? This makes these semantics more explicit IMHO. ... } else if (highestNackProposal.isSome()) { // We still want to wait for more potential NACKs // so we can return the highest proposal number seen // but we don't care about any more ACKs. return; } else { ... src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52968> How about saying, "The position has been promised to us so the 'id' field should match the proposal we sent in the request." src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52959> Why not just: promise.set(response); src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52960> Use CopyFrom when you want to guarantee copy semantics. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52963> Maybe a comment reminding the reader that the remaining responses get discarded in 'finalize'. ;) src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52971> I know this is my old code, but I think in the interest of readability we should clean this up and just be redundant and explicit. First, how about making highestAckAction an Option. Then: } else if (response.action().has_performed()) { // An action has already been performed in this position, we // need to save the action with the highest proposal. if (highestAckAction.isNone() || (response.action().performed() > highestAckAction.get().performed())) { highestAckAction = response.action(); } } src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52967> How about saying: Received a response for a position that had previously been promised to some other proposer but an action had not been performed or learned. No need to do anything here, the position is not promised to us. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52972> Let's not increment here. I think it's nicer to see the first thing we do when we get a response is increment the number of responses (above). src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52969> if (highestNackProposal.isSome()) { } else { src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52973> if (highestActAction.isSome()) src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52970> CopyFrom, ignore my old bad code. ;) src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52976> We tend to not use shortened variable names in the codebase. I think it's okay for Ack and Nack, but maybe s/numResponsesReceived/responsesReceived/? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52974> Ditto above. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52978> Option here too? highestNackProposal = std::max(highestNackProposal.get(response.id()), response.id()); src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52979> Option? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52975> How about incrementing the responses above again? I like those semantics better. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52980> Could CHECK_SOME(highestEndPosition). src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52977> Same comment re: naming as above. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52981> The MergeFrom's should all likely be CopyFrom, my bad. :/ src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52982> Do we really need a process for this? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52983> Great! ;) Just s/promising/'promising'/. src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52984> else if? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52985> Ha, 'catch-up', nice! ;) src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52986> s/catchup/log::catchup/ src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52987> How about passing 'action' through the write and learn phases rather than making it an instance member? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52988> Do we really need/care to check the learned phase? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52989> s/bump/retry/ and s/p/highestNackProposal? src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52990> s/win/wins/ src/log/consensus.cpp <https://reviews.apache.org/r/14631/#comment52991> s/(double)::random()/(double) ::random()/ src/log/coordinator.hpp <https://reviews.apache.org/r/14631/#comment52994> Revert formatting fix please. src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment52995> virtual src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment52998> s/};/} state;/ src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment52999> Kill (see above). src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment53002> Instead, let's have elect return an Option and if it's none then we'll assume that means we can retry. Let's keep failures exceptional. No control flow on failures! ;) src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment53001> Let's move these below the other ones which need to get called first. In general, we should organize the methods in the non-failure order so people can just read through the functions as though there were no asynchronous continuations. That will also let us easily port to C++11. ;) src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment53005> But might be slow, or worse, cause really weird non-deterministic performance (e.g., every once in a while we re-run fill because the local replica hadn't yet processed the learn giving weird performance blips. I think this part needs a little more thinking. What about sending a learn directly to the local replica in addition to broadcasting to the others? I could imagine another log::learn that also takes the replica pointer and only returns a future for once the local replica has received a LearnResponse but not the rest of the network. And if you don't want to send the LearnRequest twice to the local replica just filter that pid out when you broadcast to the network. And let's capture some TODOs some where about not writing the data twice (one for the WriteRequest and once for the WriteResponse) by using something like checksums. src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment53003> Eventually we'll want to propagate the future all the way out (even past the Log class). But not in this review, please no. ;) src/log/coordinator.cpp <https://reviews.apache.org/r/14631/#comment53004> I understand that returning an Option from CoordinatorProcess::elect is going to complicate the 'then' chained logic a bit, but give it a shot and let's check it out. I really do not like special casing the failure message! src/log/replica.cpp <https://reviews.apache.org/r/14631/#comment52996> How about: Highest proposal used by this coordinator. src/log/replica.cpp <https://reviews.apache.org/r/14631/#comment52997> s/performence/performance/ src/tests/log_tests.cpp <https://reviews.apache.org/r/14631/#comment53006> How about a comment about why you do this _after_ the replicas start up? - Benjamin Hindman On Oct. 15, 2013, 7:49 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/14631/ > ----------------------------------------------------------- > > (Updated Oct. 15, 2013, 7:49 p.m.) > > > Review request for mesos and Benjamin Hindman. > > > Repository: mesos-git > > > Description > ------- > > This is the first patch of a series of patches that implement a catch-up > mechanism for replicated log. See the following ticket for more details: > https://issues.apache.org/jira/browse/MESOS-736 > > Here is a brief summary of this patch: (Sorry for the fact that we are not > able to break it into smaller patches :() > > 1) Pulled the original Coordinator logic out and divides it into several > Paxos phases (see src/log/consensus.hpp). Instead of using a blocking > semantics, we implemented all the logics asynchronously. > > 2) In order to ensure the liveness of a catch-uper, we implemented a retry > logic by bumping the proposal number. This also requires us to slightly > change the existing replica protocol. > > 3) Made the "fill" operation independent of the underlying replica. Instead, > introduced a catchup (see src/log/catchup.hpp) function to make sure the > underlying local replica has learned each write. > > 4) Modified the log tests to adapt to the new semantics (see (3) above) > > This is a joint work with Yan Xu. > > > Diffs > ----- > > src/Makefile.am a2d8242 > src/log/catchup.hpp PRE-CREATION > src/log/catchup.cpp PRE-CREATION > src/log/consensus.hpp PRE-CREATION > src/log/consensus.cpp PRE-CREATION > src/log/coordinator.hpp 3f6fb7c > src/log/coordinator.cpp 6e6466f > src/log/replica.hpp d1f5ead > src/log/replica.cpp 59a6ff3 > src/tests/log_tests.cpp ff5f86c > > Diff: https://reviews.apache.org/r/14631/diff/ > > > Testing > ------- > > bin/mesos-tests.sh --gtest_filter=*CoordinatorTest*:*LogTest*:*ReplicaTest* > --gtest_repeat=100 > > > Thanks, > > Jie Yu > >
