----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/13259/#review24746 -----------------------------------------------------------
Ship it! src/master/hierarchical_allocator_process.hpp <https://reviews.apache.org/r/13259/#comment48850> Can we keep this a CHECK by not invoking slaveRemoved from the master if the slave is disconnected? src/master/master.hpp <https://reviews.apache.org/r/13259/#comment48852> Can we highlight that we care about disconnected slaves because with slave recovery the slave might reconnect? src/master/master.cpp <https://reviews.apache.org/r/13259/#comment48853> Can we add a TODO that explores Allocator::slaveDisconnected? src/master/master.cpp <https://reviews.apache.org/r/13259/#comment48856> Yes, this is a bug in the allocator. Let's do Allocator::resourcesRecovered for now with a TODO that reevaluates this after we fix the bug in the allocator. src/master/master.cpp <https://reviews.apache.org/r/13259/#comment48861> This sounds like a bug too. We should add a test for this. src/tests/slave_recovery_tests.cpp <https://reviews.apache.org/r/13259/#comment48862> s/isolator1/isolator/ src/tests/slave_recovery_tests.cpp <https://reviews.apache.org/r/13259/#comment48863> Let's make it clear (here and elsewhere) that we're shutting down the executor manually because otherwise it won't get cleaned up rather than we're shutting down the executor in order to cause some event to occur in the test (which was my first impression). - Benjamin Hindman On Aug. 6, 2013, 8:44 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/13259/ > ----------------------------------------------------------- > > (Updated Aug. 6, 2013, 8:44 p.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Bugs: MESOS-614 > https://issues.apache.org/jira/browse/MESOS-614 > > > Repository: mesos-git > > > Description > ------- > > When the master is waiting for a disconected checkpointed slave to > "reregister", but the restarted slave tries to "register" (because it failed > recovery) the master should remove the old slave. Also, we remove the > disconnected slave from the allocator so that we don't offer its resources. > > > Diffs > ----- > > src/master/hierarchical_allocator_process.hpp > ce06e7437b760fa6356ef86ae6800907282865ed > src/master/master.hpp 8ec0c17c71b7b4679d4f712d0fb742d420c9152d > src/master/master.cpp b0a2757af3ec83ead53374504fe24d3a8f7673ad > src/tests/slave_recovery_tests.cpp c451e0f4c571a646d375aa89e806e1a4058d39e7 > > Diff: https://reviews.apache.org/r/13259/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Vinod Kone > >
