----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/26846/#review57168 -----------------------------------------------------------
Looks great! Main two comments from below are: (1) Could we now removing the SUM <-> Slave coupling by passing a function to the SUM, instead of the pid? (scheduler.hpp and low_level_scheduler_libprocess.cpp will provide an example). (2) Do we want to introduce `pause()` and `resume()` (as opposed to just `flush()`) on the SUM to avoid all the dropped update logging for disconnected slaves? src/master/master.cpp <https://reviews.apache.org/r/26846/#comment97671> Should we have a TODO here to start using 'from' in a later version (0.22.0)? Is there value in keeping the 'pid' field now that it's sent via the slave, and acks go through the master? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97690> Hm.. this was a bit tricky to grasp, should this be a CHECK now, to reflect that a 'cleanup' slave should not be (re-)registering? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97698> In line with my other comments, do we want an equivalent `pause()` ability on the SUM to avoid all the dropped logging? At which point `flush()` would become `resume()`. src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97697> Should there be any comment here about the case when the slave already received an ack for this particular update (because of the race)? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97695> Any value to this full state CHECK? Or is it for consistency? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97694> Is this really a WARNING? Sounds like it's going to happen a lot..? Will this dominate the logs for disconnected slaves? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97700> Hm.. maybe do a fresh re-work of this entire comment to avoid the redundant mention of the slave / framework terminating cases? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97702> Maybe this conditional is a bit more intuitive the other way? if (state == TERMINATING && frameworks.empty()) { ... } Because it seems that one thinks of this as "we're terminating, and we've removed the last framework as part of our termination": ``` // Still terminating.. state==TERMINATING frameworks.empty()==false // Still terminating.. state==TERMINATING frameworks.empty()==false // Done! state==TERMINATING frameworks.empty()==true ``` Or am I misinterpreting how to think about this? src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97705> Should this be an else if with an else that fails due to unknown recover flag? Or a CHECK inside the else that recover == "cleanup"? Just want to enforce the assumption that it's "cleanup" here. src/slave/slave.cpp <https://reviews.apache.org/r/26846/#comment97706> Thank you for the comment about non-local reasoning needed here :) Maybe add a tiny bit about the timeout? Or say that the executor shutdown "process" began in `_recover()`? The part that is still a bit non-local IMO is that it's the signal _and_ the timeout that guarantee slave shutdown completes? src/slave/status_update_manager.cpp <https://reviews.apache.org/r/26846/#comment97689> Do we still need to pass `flags` during initialize or can it be done in the SUM constructor? src/slave/status_update_manager.cpp <https://reviews.apache.org/r/26846/#comment97688> I'm curious, could we start reducing the coupling to the `Slave` here? Now that we no longer have the hard dependency on the slave `PID` for status update messages, I'm thinking we can just pass in a function callback for forwarding: ``` SUM::initialize(const function<void(const StatusUpdate&)>& forward); Slave::initialize() { ... sum->initialize(defer(self(), &Slave::forward, lambda::_1)); ... } ``` In this first step, the SUM no longer needs to know anything about `Slave`. As a second step, we could remove the need for `initialize` entirely by providing a streaming updates method, which would leverage a Stream abstraction in libprocess: ``` Stream<StatusUpdate> SUM::updateStream(); Slave::initialize() { ... sum->updateStream() .next(defer(self(), &Slave::forward, lambda::_1)); ... } Slave::forward(Stream<StatusUpdate> updates) { ... updates .next(defer(self(), &Slave::forward, lambda::_1)); } ``` But this one is distant. :) Thoughts? Probably drop some TODOs but the first one would be nice to tackle. :) - Ben Mahler On Oct. 17, 2014, 12:24 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/26846/ > ----------------------------------------------------------- > > (Updated Oct. 17, 2014, 12:24 a.m.) > > > Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen. > > > Repository: mesos-git > > > Description > ------- > > All the updates from the status update manager are funnelled via the slave. > No more sending updates in the middle of recovery or disconnection. Yay. > > Had to cleanup flags.recover = "cleanup" semantics because the updates are no > longer sent when the slave is disconnected. > > > Diffs > ----- > > src/master/master.cpp 0a5c9a374062a241c90ea238725fbb8dd2408ef4 > src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c > src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 > src/slave/status_update_manager.hpp > 24e3882ad1969c20a64daf90e408618c310e9a81 > src/slave/status_update_manager.cpp > 5d5cf234ef2dd47fa4b1f67be761dbca31659451 > src/tests/slave_recovery_tests.cpp 4fb357bd55f69f71193e92fd03765b808f932d33 > > Diff: https://reviews.apache.org/r/26846/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Vinod Kone > >
