> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > Looks great! > > > > Main two comments from below are: > > > > (1) Could we now removing the SUM <-> Slave coupling by passing a function > > to the SUM, instead of the pid? (scheduler.hpp and > > low_level_scheduler_libprocess.cpp will provide an example). > > > > (2) Do we want to introduce `pause()` and `resume()` (as opposed to just > > `flush()`) on the SUM to avoid all the dropped update logging for > > disconnected slaves?
i'll do the SUM improvements in a subsequent review, that i'll add as a dependent. > On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > src/master/master.cpp, lines 3259-3262 > > <https://reviews.apache.org/r/26846/diff/1/?file=723849#file723849line3259> > > > > Should we have a TODO here to start using 'from' in a later version > > (0.22.0)? Is there value in keeping the 'pid' field now that it's sent via > > the slave, and acks go through the master? yes, added a TODO. > On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > src/slave/slave.cpp, line 2348 > > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2348> > > > > Should there be any comment here about the case when the slave already > > received an ack for this particular update (because of the race)? added a comment. > On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 2350-2352 > > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2350> > > > > Any value to this full state CHECK? Or is it for consistency? consistency. it was originally added as future proof, in case someone added a new state and didn't think about it's effect on all these methods. > On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 2354-2359 > > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2354> > > > > Is this really a WARNING? Sounds like it's going to happen a lot..? > > > > Will this dominate the logs for disconnected slaves? good point. with resume, we should see fewer of these warnings. a log message is still useful in my opinion during debugging. > On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote: > > src/slave/slave.cpp, lines 3437-3440 > > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line3437> > > > > Thank you for the comment about non-local reasoning needed here :) > > > > Maybe add a tiny bit about the timeout? Or say that the executor > > shutdown "process" began in `_recover()`? The part that is still a bit > > non-local IMO is that it's the signal _and_ the timeout that guarantee > > slave shutdown completes? _recover() either calls shutdownExecutor() (which calls containerizer->destroy() after timeout) or containerizer->destroy(). The only guarantee that the slave terminates is that containerizer->destroy() eventually results in executorTerminated(). Expanded the comment. - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26846/#review57168 ----------------------------------------------------------- On Oct. 17, 2014, 12:24 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26846/ > ----------------------------------------------------------- > > (Updated Oct. 17, 2014, 12:24 a.m.) > > > Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen. > > > Repository: mesos-git > > > Description > ------- > > All the updates from the status update manager are funnelled via the slave. > No more sending updates in the middle of recovery or disconnection. Yay. > > Had to cleanup flags.recover = "cleanup" semantics because the updates are no > longer sent when the slave is disconnected. > > > Diffs > ----- > > src/master/master.cpp 0a5c9a374062a241c90ea238725fbb8dd2408ef4 > src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c > src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 > src/slave/status_update_manager.hpp > 24e3882ad1969c20a64daf90e408618c310e9a81 > src/slave/status_update_manager.cpp > 5d5cf234ef2dd47fa4b1f67be761dbca31659451 > src/tests/slave_recovery_tests.cpp 4fb357bd55f69f71193e92fd03765b808f932d33 > > Diff: https://reviews.apache.org/r/26846/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Vinod Kone > >
