> On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote: > > We scanned all the log related code. There are a few places that need to be > > taken care of. > > > > Log::Writer::append/truncate > > Log::Reader::read > > > > In the above functions, if timeout happens, we'll invoke > > 'future.discard()'. Should we wait for 'future' to become DISCARDED before > > we return None()? Maybe a TODO there? > > > > LogReaderProcess/LogWriterProcess::recover() > > > > Should we register 'onDiscard' callback on promise->future() and do > > 'promise->discard()' if we detect a discard attempt from the user? > > Benjamin Hindman wrote: > We shouldn't need to do anything for Log::Writer::append/truncate or > Log::Writer::read since those functions don't return a future. The underlying > functions LogWriterProcess::append/truncate and LogReaderProcess::read just > chain futures so a Future::discard on what ever they return should propagate > through (unlike the cgroups code where we return a future from a promise and > don't chain or associate that promise with any other asynchronous calls that > are made). > > The reason why I didn't wait for the completion of the future in > Log::Writer::* and Log::Reader::* after we do a Future::discard is because we > weren't waiting before (well, we couldn't technically wait before since > discarded happened immediately!). > > I've added a TODO to Log*Process::recover to register onDiscard callbacks.
> We shouldn't need to do anything for Log::Writer::append/truncate or > Log::Writer::read since those functions don't return a future. My opinion is: even if those function do not return futures. The users expect that once the function has returned (say None()), they can immediately start/restart an operation. It's likely that the new operation can overlap with the previous operation that is being cancelled. But I agree with you that we should not do a lot of code motions in this patch. We can just maintain the same semantics. > On Feb. 12, 2014, 11:43 p.m., Jie Yu wrote: > > src/log/catchup.cpp, line 235 > > <https://reviews.apache.org/r/17686/diff/3/?file=470407#file470407line235> > > > > We have a hard time understanding why you are changing the logic here. > > Seems that the timer you created here will get fired no matter what. What > > if the 'catching' operation succeeds? > > > > IIUC, except for the 'finalize' function, you don't have to do any > > change here. > > Benjamin Hindman wrote: > With the old semantics calling 'timedout' would do 'catching.discard()' > which would cause 'discarded' to get invoked which would restart 'catchup'. > There was a lot weird about this IMHO: > > (1) We used 'catching.discard()' to imply a timeout, even the code in > 'discarded' mentions the timeout, but that discard could have come from > 'log::catchup'! > (2) Given (1), if 'log::catchup' actually discarded the future we just > simply tried again! :( > > So now, when we timeout, we simply want to start another 'log::catchup'. > Note that we don't wait for the old 'log::catchup' to complete just as we > didn't before. In addition, a discarded event now properly propagates, and in > this case I choose to propagate it as an error. > > I did notice here that I should really do 'catching.discard()' in > 'timedout' and then skip old attempts at 'log::catchup' in 'discarded' so I'm > keeping this issue open for you to take another look. Regarding (1), we thought about what the right semantics about discard should be. Here is what we got: We should maintain the following invariant in our code base: If promise.future().hasDiscard() is false, we should not do promise.discard(). This can greatly simplify the reasoning of the discard semantics. In this case, we should never assume that log::catchup will get discarded internally. If an async operation decides to terminate before completion, that should be a failure (i.e., the future will be set to FAILED instead of DISCARDED). - Jie ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/17686/#review34246 ----------------------------------------------------------- On Feb. 18, 2014, 8:46 a.m., Benjamin Hindman wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/17686/ > ----------------------------------------------------------- > > (Updated Feb. 18, 2014, 8:46 a.m.) > > > Review request for mesos, Adam B, Ben Mahler, Ian Downes, Jie Yu, Niklas > Nielsen, TILL TOENSHOFF, Vinod Kone, and Jiang Yan Xu. > > > Repository: mesos-git > > > Description > ------- > > See summary. > > > Diffs > ----- > > src/java/jni/org_apache_mesos_state_AbstractState.cpp > 2ee0b1b631b80ec783e6bce683cdeaa77e56b2aa > src/linux/cgroups.cpp 8ac25993886f5092fe6a58abdddbbb71e02911af > src/log/catchup.cpp 59facbf9b5d065ba5066b6b443db4ee8e05ee33b > src/log/consensus.cpp b89673a3b8f233e901eaf9ae69a9979099f4eb73 > src/log/log.cpp e83f822af86a2389e2b1abab9489713cb59838c2 > src/log/recover.cpp d06e5ada714ba0e5359ff1d3381edb6d526c6ad3 > src/master/detector.cpp 7e10433013b9415ec73c388e8dc69ab0989cdbc2 > src/master/registrar.cpp 915885a160f790399e8185c28c6e6555af1ee76e > src/sasl/authenticatee.hpp f1a677f8aed0979f958e51f85e0a8210a03bd343 > src/sasl/authenticator.hpp 1478f6771b424555c34586a0d61f208dc15b0e7d > src/slave/gc.hpp 328aa315d8ebd7ea5d05b57626ea4dbfc2206270 > src/slave/gc.cpp 405350bf8f498d2e59e9e6b4c4c19b7bdaa974de > src/zookeeper/contender.cpp 6710da4e64fc0a43c1eabfc0f39fb0133c13df14 > src/zookeeper/group.cpp 793763e8312676f3b7cceb54bbad0337c8445ea7 > > Diff: https://reviews.apache.org/r/17686/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Benjamin Hindman > >