Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Benjamin Hindman Mon, 09 Dec 2013 22:39:01 -0800

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/15802/#review29966
-----------------------------------------------------------




src/log/log.hpp
<https://reviews.apache.org/r/15802/#comment57464>

    Why s/Timeout/Duration/?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57466>

    Pass this into LogProcess::watch instead.



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57644>

    Rather than holding on to the LogProcess*, how about we make Log::recover() 
return a Future<Shared<Replica>> and that's how we get the replica. That way, 
instead of doing process->replica-> we can just have a Shared<Replica>. Then 
LogProcess::finalize doesn't need to "wait" for Readers as your comment 
suggests above (or writers, as we'll see below). And does LogProcess::finalize 
really need to wait for the Shared<Replica> to be owned?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57645>

    How about we just grab 'quorum' and 'network' like above get the 
Shared<Replica> from Log::recover(). Again, this removes our dependency on 
LogProcess*. I can imagine the LogReaderProcess and LogWriterProcess 
constructors initiate recovery on Log::recover() and save the returned future 
and use that to gate anything else (and obviously the future must have been 
satisfied to use 'replica' since that's how we get it).



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57465>

    s/(UPID)replica/(UPID) replica/



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57641>

    Some comments on why you need to wait for these would be great.



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57642>

    When would we want '!strict' with the log? I see you commented on why we 
might not want strict for the replica when writing tests, but if we are 
creating a log, won't we always want to recover?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57467>

    How about a CHECK(replica.unique())? Eventually this is what "release" is 
for correct? If yes, maybe a TODO?



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57468>

    I'd prefer to keep these "timeouts" be of type Timeout not Duration (in all 
these methods).



src/log/log.cpp
<https://reviews.apache.org/r/15802/#comment57469>

    You can just do 'future.await(timeout.remaining())' here.



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57488>

    For a future that we don't control we should transition our code to be more 
robust and move to a model where we don't expect them _not_ to be discarded. 
Instead, let's do something like:
    
    CHECK(!future.isPending());
    
    if (!future.isReady()) {
      promise.fail("Failed to ...: " + future.isFailed() ? future.failure() : 
"future discarded";
      ...;
    }
    
    See examples in master/master.cpp.



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57503>

    Let's add helpers for Metadata::Status in common/type_utils.hpp so you can 
just do:
    
    LOG(INFO) << "Received ..." << response.status() << " status";
    
    This will also enable us to do: stringify(response.status())



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57501>

    How about just std::min? I think it will read better:
    
    if (lowestBeginPosition.isNone()) {
      lowestBeginPosition = response.begin();
    }
    
    lowestBeginPosition = std::min(lowestBeginPosition, response.begin());
    
    Also, what about defining a helper for doing "min" with options:
    
    template <typename T>
    Option<T> min(const Option<T>& left, const Option<T>& right)
    {
      if (left.isSome() && right.isSome()) {
        return std::min(left.get(), right.get());
      } else if (left.isSome()) {
        return left.get();
      } else if (right.isSome()) {
        return right.get();
      }
      return None();
    }
    
    This will make the code just:
    
    lowestBeginPosition = min(lowestBeginPosition, response.begin());



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57502>

    How about just std::max?



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57500>

    This switch needs a lot more explanation. In particular, how do we know 
we've covered all of the cases? For example, what if we get a quorum of voting 
and empty? Why don't we do anything in that case?



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57496>

    s/re-calculate/recalculate/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57497>

    s/these/this/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57492>

    s/re-gained/regained/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57494>

    s/Trying to re-gain/Try to regain/



src/log/recover.cpp
<https://reviews.apache.org/r/15802/#comment57495>

    Why the delay?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57462>

    s/replica info (Record::ReplicaInfo)/metadata (Record::Metadata)/



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57486>

    But isn't this the new code? Do we want old code to go into VOTING 
immediately?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57463>

    s/replica info/metadata/



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57475>

    Why _metadata instead of metadata_? Also, any reason not to start with '= 
metadata' and then only do 'set_status'?



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57476>

    Ditto comments above.



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57470>

    What about only having strict apply when the log is empty? As in, if the 
log is empty and strict is false then just put the replica in voting, 
otherwise, always stay in non-voting until updated (i.e., via log::recover). 
I'd like to simplify these semantics with respect to the external log::recover 
call.



src/log/replica.cpp
<https://reviews.apache.org/r/15802/#comment57471>

    Please put '=' on previous line.



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57472>

    You can only run these tests if Mesos is built with Java. For that reason 
most ZK tests have been put in a separate file and conditionally included in 
src/Makefile.am. Some tests might have #define'd included tests too. Take a 
look in src/tests/*



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57473>

    This should be LOG, then you don't need to worry about checking 
'tests::flags.verbose' (unless I'm missing something?).



src/tests/log_tests.cpp
<https://reviews.apache.org/r/15802/#comment57474>

    It's a bit too bad that this is copied code from TemporaryDirectoryTest. 
Any way you can see sharing this stuff?


- Benjamin Hindman


On Dec. 5, 2013, 7:28 p.m., Jie Yu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/15802/
> -----------------------------------------------------------
> 
> (Updated Dec. 5, 2013, 7:28 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman, Ben Mahler, Vinod Kone, and Jiang 
> Yan Xu.
> 
> 
> Bugs: MESOS-736
>     https://issues.apache.org/jira/browse/MESOS-736
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> This is the last patch of a series of patches that implement catch-up 
> replicated log.
> 
> Here is summary of this patch:
> 1) Introduced RecoverRequest/RecoverResponse.
> 2) Used Metadata in replica code.
> 3) A 2-phase empty log start-up algorithm.
> 4) Added a test to test two competing recover processes.
> 
> This is a joint work with Yan Xu.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am abef3d2 
>   src/java/jni/org_apache_mesos_Log.cpp 36c636d 
>   src/log/coordinator.hpp 3f6fb7c 
>   src/log/coordinator.cpp 6e6466f 
>   src/log/log.hpp 77edc7a 
>   src/log/log.cpp d057925 
>   src/log/recover.hpp PRE-CREATION 
>   src/log/recover.cpp PRE-CREATION 
>   src/log/replica.hpp d1f5ead 
>   src/log/replica.cpp 82c2157 
>   src/messages/log.proto e6460ab 
>   src/tests/log_tests.cpp ff5f86c 
> 
> Diff: https://reviews.apache.org/r/15802/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> bin/mesos-tests.sh 
> --gtest_filter=LogTest.*:ReplicaTest.*:CoordinatorTest.*:LogZooKeeperTest.* 
> --gtest_repeat=100
> 
> 
> Thanks,
> 
> Jie Yu
> 
>

Re: Review Request 15802: Catch-up Replicated Log 3: Added log recovery support.

Reply via email to