Re: Review Request: Slave Restart (Part 8): Recover status update manager

Vinod Kone Tue, 29 Jan 2013 14:50:49 -0800


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 111
> > <https://reviews.apache.org/r/8736/diff/2/?file=243208#file243208line111>
> >
> >     I'd like to make sure we understand and clearly document the semantics 
> > of what it means to have the default '--recovery=reconnect' when the slave 
> > doesn't actually have any old slave state to reconnect to.


updated the comment. lemme if it looks ok.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1456
> > <https://reviews.apache.org/r/8736/diff/2/?file=243209#file243209line1456>
> >
> >     I do not like the semantics of an error getting returned and us 
> > deciding that we've "successfully" recovered! Considering using a Result to 
> > distinguish _no_ slave state to recover from an actual error. This needs to 
> > get done everywhere that an error is swallowed (even though it's logged).

we always exit when recovery fails now.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 1466-1468
> > <https://reviews.apache.org/r/8736/diff/2/?file=243209#file243209line1466>
> >
> >     Invoking recover on the status update manager is asynchronous. In the 
> > event there was an error recovering, I don't think we should set 'recovered 
> > = true' so optimistically!

Made 'recover' asynchronous.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.cpp, line 175
> > <https://reviews.apache.org/r/8736/diff/2/?file=243211#file243211line175>
> >
> >     I don't like that semantics of using the "checkpointing" flag to "hack" 
> > the replaying of checkpoints. It seems like replying a stream should be a 
> > fundamental primitive, and shouldn't require a breach of isolation. Let's 
> > really separate the concerns here such that the StatusUpdateManager doesn't 
> > need to concern itself with the internals of the StatusUpdateStream's 
> > implementation.

agreed, refactored.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.cpp, line 138
> > <https://reviews.apache.org/r/8736/diff/2/?file=243211#file243211line138>
> >
> >     Is there a non-hacky way to do this? If so, can you leave a TODO that 
> > describes what you're thinking for posterity?

“I never put off till tomorrow what I can possibly do - the day after.” -- O.W.

fixed.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/#review15590
-----------------------------------------------------------


On Dec. 25, 2012, 7:57 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8736/
> -----------------------------------------------------------
> 
> (Updated Dec. 25, 2012, 7:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Status update manager now recovers the update stream and re-sends any pending 
> updates.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8736/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>

Re: Review Request: Slave Restart (Part 8): Recover status update manager

Reply via email to