> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > Looks great!
> > 
> > Main two comments from below are:
> > 
> > (1) Could we now removing the SUM <-> Slave coupling by passing a function 
> > to the SUM, instead of the pid? (scheduler.hpp and 
> > low_level_scheduler_libprocess.cpp will provide an example).
> > 
> > (2) Do we want to introduce `pause()` and `resume()` (as opposed to just 
> > `flush()`) on the SUM to avoid all the dropped update logging for 
> > disconnected slaves?

i'll do the SUM improvements in a subsequent review, that i'll add as a 
dependent.


> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > src/master/master.cpp, lines 3259-3262
> > <https://reviews.apache.org/r/26846/diff/1/?file=723849#file723849line3259>
> >
> >     Should we have a TODO here to start using 'from' in a later version 
> > (0.22.0)? Is there value in keeping the 'pid' field now that it's sent via 
> > the slave, and acks go through the master?

yes, added a TODO.


> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 2348
> > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2348>
> >
> >     Should there be any comment here about the case when the slave already 
> > received an ack for this particular update (because of the race)?

added a comment.


> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 2350-2352
> > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2350>
> >
> >     Any value to this full state CHECK? Or is it for consistency?

consistency. it was originally added as future proof, in case someone added a 
new state and didn't think about it's effect on all these methods.


> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 2354-2359
> > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line2354>
> >
> >     Is this really a WARNING? Sounds like it's going to happen a lot..?
> >     
> >     Will this dominate the logs for disconnected slaves?

good point. with resume, we should see fewer of these warnings. a log message 
is still useful in my opinion during debugging.


> On Oct. 17, 2014, 6:23 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 3437-3440
> > <https://reviews.apache.org/r/26846/diff/1/?file=723851#file723851line3437>
> >
> >     Thank you for the comment about non-local reasoning needed here :)
> >     
> >     Maybe add a tiny bit about the timeout? Or say that the executor 
> > shutdown "process" began in `_recover()`? The part that is still a bit 
> > non-local IMO is that it's the signal _and_ the timeout that guarantee 
> > slave shutdown completes?

_recover() either calls shutdownExecutor() (which calls 
containerizer->destroy() after timeout) or containerizer->destroy().

The only guarantee that the slave terminates is that containerizer->destroy() 
eventually results in executorTerminated().

Expanded the comment.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26846/#review57168
-----------------------------------------------------------


On Oct. 17, 2014, 12:24 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26846/
> -----------------------------------------------------------
> 
> (Updated Oct. 17, 2014, 12:24 a.m.)
> 
> 
> Review request for mesos, Adam B, Ben Mahler, and Niklas Nielsen.
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> All the updates from the status update manager are funnelled via the slave. 
> No more sending updates in the middle of recovery or disconnection. Yay.
> 
> Had to cleanup flags.recover = "cleanup" semantics because the updates are no 
> longer sent when the slave is disconnected. 
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 0a5c9a374062a241c90ea238725fbb8dd2408ef4 
>   src/slave/slave.hpp 342b09fc084c20d98d096bb129830440179c092c 
>   src/slave/slave.cpp 0e342ed35e3db3b68f9f32b6cf4ace23e4a4db38 
>   src/slave/status_update_manager.hpp 
> 24e3882ad1969c20a64daf90e408618c310e9a81 
>   src/slave/status_update_manager.cpp 
> 5d5cf234ef2dd47fa4b1f67be761dbca31659451 
>   src/tests/slave_recovery_tests.cpp 4fb357bd55f69f71193e92fd03765b808f932d33 
> 
> Diff: https://reviews.apache.org/r/26846/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Reply via email to