----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/15802/#review29966 -----------------------------------------------------------
src/log/log.hpp <https://reviews.apache.org/r/15802/#comment57464> Why s/Timeout/Duration/? src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57466> Pass this into LogProcess::watch instead. src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57644> Rather than holding on to the LogProcess*, how about we make Log::recover() return a Future<Shared<Replica>> and that's how we get the replica. That way, instead of doing process->replica-> we can just have a Shared<Replica>. Then LogProcess::finalize doesn't need to "wait" for Readers as your comment suggests above (or writers, as we'll see below). And does LogProcess::finalize really need to wait for the Shared<Replica> to be owned? src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57645> How about we just grab 'quorum' and 'network' like above get the Shared<Replica> from Log::recover(). Again, this removes our dependency on LogProcess*. I can imagine the LogReaderProcess and LogWriterProcess constructors initiate recovery on Log::recover() and save the returned future and use that to gate anything else (and obviously the future must have been satisfied to use 'replica' since that's how we get it). src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57465> s/(UPID)replica/(UPID) replica/ src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57641> Some comments on why you need to wait for these would be great. src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57642> When would we want '!strict' with the log? I see you commented on why we might not want strict for the replica when writing tests, but if we are creating a log, won't we always want to recover? src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57467> How about a CHECK(replica.unique())? Eventually this is what "release" is for correct? If yes, maybe a TODO? src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57468> I'd prefer to keep these "timeouts" be of type Timeout not Duration (in all these methods). src/log/log.cpp <https://reviews.apache.org/r/15802/#comment57469> You can just do 'future.await(timeout.remaining())' here. src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57488> For a future that we don't control we should transition our code to be more robust and move to a model where we don't expect them _not_ to be discarded. Instead, let's do something like: CHECK(!future.isPending()); if (!future.isReady()) { promise.fail("Failed to ...: " + future.isFailed() ? future.failure() : "future discarded"; ...; } See examples in master/master.cpp. src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57503> Let's add helpers for Metadata::Status in common/type_utils.hpp so you can just do: LOG(INFO) << "Received ..." << response.status() << " status"; This will also enable us to do: stringify(response.status()) src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57501> How about just std::min? I think it will read better: if (lowestBeginPosition.isNone()) { lowestBeginPosition = response.begin(); } lowestBeginPosition = std::min(lowestBeginPosition, response.begin()); Also, what about defining a helper for doing "min" with options: template <typename T> Option<T> min(const Option<T>& left, const Option<T>& right) { if (left.isSome() && right.isSome()) { return std::min(left.get(), right.get()); } else if (left.isSome()) { return left.get(); } else if (right.isSome()) { return right.get(); } return None(); } This will make the code just: lowestBeginPosition = min(lowestBeginPosition, response.begin()); src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57502> How about just std::max? src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57500> This switch needs a lot more explanation. In particular, how do we know we've covered all of the cases? For example, what if we get a quorum of voting and empty? Why don't we do anything in that case? src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57496> s/re-calculate/recalculate/ src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57497> s/these/this/ src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57492> s/re-gained/regained/ src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57494> s/Trying to re-gain/Try to regain/ src/log/recover.cpp <https://reviews.apache.org/r/15802/#comment57495> Why the delay? src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57462> s/replica info (Record::ReplicaInfo)/metadata (Record::Metadata)/ src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57486> But isn't this the new code? Do we want old code to go into VOTING immediately? src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57463> s/replica info/metadata/ src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57475> Why _metadata instead of metadata_? Also, any reason not to start with '= metadata' and then only do 'set_status'? src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57476> Ditto comments above. src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57470> What about only having strict apply when the log is empty? As in, if the log is empty and strict is false then just put the replica in voting, otherwise, always stay in non-voting until updated (i.e., via log::recover). I'd like to simplify these semantics with respect to the external log::recover call. src/log/replica.cpp <https://reviews.apache.org/r/15802/#comment57471> Please put '=' on previous line. src/tests/log_tests.cpp <https://reviews.apache.org/r/15802/#comment57472> You can only run these tests if Mesos is built with Java. For that reason most ZK tests have been put in a separate file and conditionally included in src/Makefile.am. Some tests might have #define'd included tests too. Take a look in src/tests/* src/tests/log_tests.cpp <https://reviews.apache.org/r/15802/#comment57473> This should be LOG, then you don't need to worry about checking 'tests::flags.verbose' (unless I'm missing something?). src/tests/log_tests.cpp <https://reviews.apache.org/r/15802/#comment57474> It's a bit too bad that this is copied code from TemporaryDirectoryTest. Any way you can see sharing this stuff? - Benjamin Hindman On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/15802/ > ----------------------------------------------------------- > > (Updated Dec. 5, 2013, 7:28 p.m.) > > > Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang > Yan Xu. > > > Bugs: MESOS-736 > https://issues.apache.org/jira/browse/MESOS-736 > > > Repository: mesos-git > > > Description > ------- > > This is the last patch of a series of patches that implement catch-up > replicated log. > > Here is summary of this patch: > 1) Introduced RecoverRequest/RecoverResponse. > 2) Used Metadata in replica code. > 3) A 2-phase empty log start-up algorithm. > 4) Added a test to test two competing recover processes. > > This is a joint work with Yan Xu. > > > Diffs > ----- > > src/Makefile.am abef3d2 > src/java/jni/org_apache_mesos_Log.cpp 36c636d > src/log/coordinator.hpp 3f6fb7c > src/log/coordinator.cpp 6e6466f > src/log/log.hpp 77edc7a > src/log/log.cpp d057925 > src/log/recover.hpp PRE-CREATION > src/log/recover.cpp PRE-CREATION > src/log/replica.hpp d1f5ead > src/log/replica.cpp 82c2157 > src/messages/log.proto e6460ab > src/tests/log_tests.cpp ff5f86c > > Diff: https://reviews.apache.org/r/15802/diff/ > > > Testing > ------- > > make check > > bin/mesos-tests.sh > --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* > --gtest_repeat=100 > > > Thanks, > > Jie Yu > >
