----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22796/#review48665 -----------------------------------------------------------
Thanks Adam and Tim! Looks good, I made a few small suggestions below, mainly to try to avoid logging an INFO message every time the offer removal race occurs. Was surprised that the tests needed to use 2x the timeout? Did you run them in repetition? src/master/flags.hpp <https://reviews.apache.org/r/22796/#comment85390> If you look above, only one space between each flag. :) src/master/flags.hpp <https://reviews.apache.org/r/22796/#comment85392> How about: "Duration of time before an offer is rescinded from a framework. This can help fairness when running frameworks that hold on to offers, or frameworks that accidentally drop offers. To disable the timeout, set this to '0secs'". Will '0' work for this, since it's a duration? src/master/master.hpp <https://reviews.apache.org/r/22796/#comment85393> const & src/master/master.cpp <https://reviews.apache.org/r/22796/#comment85394> How about a little comment here: // Rescind the offer after the timeout elapses. What about wrapping at the delay, might be less "jagged" since it's only a few characters: offerTimers[offer->id()] = delay(flags.offer_timeout, self(), &Self::removeOffer, offer->id(), true); src/master/master.cpp <https://reviews.apache.org/r/22796/#comment85395> const & src/master/master.cpp <https://reviews.apache.org/r/22796/#comment85396> It looks like this NULL case is only possible for the rescind timeout, since all synchronous calls to removeOffers previously could not hit this condition. What about instead adding a small method for the delay() to invoke: void Master::offerTimeout(const OfferID&) { Offer* offer = getOffer(offerId); if (offer != NULL) { removeOffer(offer, true); } } This way, we only introduce the NULL case where it's expected and harmless. It makes it even less critical to remove the timers correctly since it wouldn't have an impact on the logging. The other reason I would suggest this approach is that the contents of offerTimeout would be a possible candidate for being moved to a C++11 lambda function instead of making a member method. src/master/master.cpp <https://reviews.apache.org/r/22796/#comment85406> hashmap has .contains(): if (offerTimers.contains(offer->id())) { ... } src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85398> Why did you need 2x the timeout? 1x should be enough, no? Could you add a little comment to this section? E.g. " Now advance to the offer timeout, we need to settle the clock to ensure that the offer rescind timeout would be processed if triggered. " src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85397> Might want to store the flags, pass them to the master and use them here, to ensure they have the same values: master::Flags masterFlags = MesosTest::CreateMasterFlags(); Try<PID<Master> > master = StartMaster(masterFlags); ... Clock::advance(masterFlags.offer_timeout * 2.0); Otherwise there is an implicit assumption that StartMaster() uses CreateMasterFlags(). src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85399> I'm a bit puzzled by this test, I don't think we need this test since the framework cannot receive _any_ events after a framework unregistration, which seems orthogonal to the timeout you've introduced. src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85401> Could you add the same comment here that I mentioned above? src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85405> s/task/a task/ src/tests/master_tests.cpp <https://reviews.apache.org/r/22796/#comment85404> Ditto above for comment and 1x timeout. - Ben Mahler On July 24, 2014, 3 a.m., Timothy Chen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22796/ > ----------------------------------------------------------- > > (Updated July 24, 2014, 3 a.m.) > > > Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen. > > > Bugs: MESOS-186 > https://issues.apache.org/jira/browse/MESOS-186 > > > Repository: mesos-git > > > Description > ------- > > Based on Kapil's patch (https://reviews.apache.org/r/22066/), adding timeout > for each offer from master to remove the offer when it's no longer used. > > > Diffs > ----- > > src/master/flags.hpp 32704ce > src/master/master.hpp fa46a67 > src/master/master.cpp fb2fd5a > src/tests/master_tests.cpp 5a1cf7f > > Diff: https://reviews.apache.org/r/22796/diff/ > > > Testing > ------- > > Added three more unit tests from Kapil's patch: Testing offer not rescinded > after task launched, offer not rescinded when framework/slave unregistered. > The test exposed a race condition that can lead to a segfault if two remove > offers are called on the same offer. > > make check. > > > Thanks, > > Timothy Chen > >
