Re: Review Request 17686: Updated Mesos to use new libprocess discard semantics.

Jie Yu Wed, 19 Feb 2014 16:12:12 -0800


> On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote:
> > We scanned all the log related code. There are a few places that need to be 
> > taken care of.
> > 
> > Log::Writer::append/truncate
> > Log::Reader::read
> > 
> > In the above functions, if timeout happens, we'll invoke 
> > 'future.discard()'. Should we wait for 'future' to become DISCARDED before 
> > we return None()? Maybe a TODO there?
> > 
> > LogReaderProcess/LogWriterProcess::recover()
> > 
> > Should we register 'onDiscard' callback on promise->future() and do 
> > 'promise->discard()' if we detect a discard attempt from the user?
> 
> Benjamin Hindman wrote:
>     We shouldn't need to do anything for Log::Writer::append/truncate or 
> Log::Writer::read since those functions don't return a future. The underlying 
> functions LogWriterProcess::append/truncate and LogReaderProcess::read just 
> chain futures so a Future::discard on what ever they return should propagate 
> through (unlike the cgroups code where we return a future from a promise and 
> don't chain or associate that promise with any other asynchronous calls that 
> are made).
>     
>     The reason why I didn't wait for the completion of the future in 
> Log::Writer::* and Log::Reader::* after we do a Future::discard is because we 
> weren't waiting before (well, we couldn't technically wait before since 
> discarded happened immediately!).
>     
>     I've added a TODO to Log*Process::recover to register onDiscard callbacks.


> We shouldn't need to do anything for Log::Writer::append/truncate or 
> Log::Writer::read since those functions don't return a future.

My opinion is: even if those function do not return futures. The users expect 
that once the function has returned (say None()), they can immediately 
start/restart an operation. It's likely that the new operation can overlap with 
the previous operation that is being cancelled.

But I agree with you that we should not do a lot of code motions in this patch. 
We can just maintain the same semantics.


> On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote:
> > src/log/catchup.cpp, line 235
> > <https://reviews.apache.org/r/17686/diff/3/?file=470407#file470407line235>
> >
> >     We have a hard time understanding why you are changing the logic here. 
> > Seems that the timer you created here will get fired no matter what. What 
> > if the 'catching' operation succeeds?
> >     
> >     IIUC, except for the 'finalize' function, you don't have to do any 
> > change here.
> 
> Benjamin Hindman wrote:
>     With the old semantics calling 'timedout' would do 'catching.discard()' 
> which would cause 'discarded' to get invoked which would restart 'catchup'. 
> There was a lot weird about this IMHO:
>     
>     (1) We used 'catching.discard()' to imply a timeout, even the code in 
> 'discarded' mentions the timeout, but that discard could have come from 
> 'log::catchup'!
>     (2) Given (1), if 'log::catchup' actually discarded the future we just 
> simply tried again! :(
>     
>     So now, when we timeout, we simply want to start another 'log::catchup'. 
> Note that we don't wait for the old 'log::catchup' to complete just as we 
> didn't before. In addition, a discarded event now properly propagates, and in 
> this case I choose to propagate it as an error.
>     
>     I did notice here that I should really do 'catching.discard()' in 
> 'timedout' and then skip old attempts at 'log::catchup' in 'discarded' so I'm 
> keeping this issue open for you to take another look.

Regarding (1), we thought about what the right semantics about discard should 
be. Here is what we got:

We should maintain the following invariant in our code base:

If promise.future().hasDiscard() is false, we should not do promise.discard().

This can greatly simplify the reasoning of the discard semantics.

In this case, we should never assume that log::catchup will get discarded 
internally.

If an async operation decides to terminate before completion, that should be a 
failure (i.e., the future will be set to FAILED instead of DISCARDED).


- Jie


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/17686/#review34246
-----------------------------------------------------------


On Feb. 18, 2014, 8:46 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/17686/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 8:46 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, Ian Downes, Jie Yu, Niklas 
> Nielsen, TILL TOENSHOFF, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/java/jni/org_apache_mesos_state_AbstractState.cpp 
> 2ee0b1b631b80ec783e6bce683cdeaa77e56b2aa 
>   src/linux/cgroups.cpp 8ac25993886f5092fe6a58abdddbbb71e02911af 
>   src/log/catchup.cpp 59facbf9b5d065ba5066b6b443db4ee8e05ee33b 
>   src/log/consensus.cpp b89673a3b8f233e901eaf9ae69a9979099f4eb73 
>   src/log/log.cpp e83f822af86a2389e2b1abab9489713cb59838c2 
>   src/log/recover.cpp d06e5ada714ba0e5359ff1d3381edb6d526c6ad3 
>   src/master/detector.cpp 7e10433013b9415ec73c388e8dc69ab0989cdbc2 
>   src/master/registrar.cpp 915885a160f790399e8185c28c6e6555af1ee76e 
>   src/sasl/authenticatee.hpp f1a677f8aed0979f958e51f85e0a8210a03bd343 
>   src/sasl/authenticator.hpp 1478f6771b424555c34586a0d61f208dc15b0e7d 
>   src/slave/gc.hpp 328aa315d8ebd7ea5d05b57626ea4dbfc2206270 
>   src/slave/gc.cpp 405350bf8f498d2e59e9e6b4c4c19b7bdaa974de 
>   src/zookeeper/contender.cpp 6710da4e64fc0a43c1eabfc0f39fb0133c13df14 
>   src/zookeeper/group.cpp 793763e8312676f3b7cceb54bbad0337c8445ea7 
> 
> Diff: https://reviews.apache.org/r/17686/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>

Re: Review Request 17686: Updated Mesos to use new libprocess discard semantics.

Reply via email to