> On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > Can we add some tests around the new refactor (e.g., race conditions > > mentioned in the review)? That would make us confident that we are not > > regressing here, because this is a significant change.
Hey Vinod - thanks a lot for looking at this! But I think we just missed each other; I uploaded a new patch before you got to publish your review. I will post a new patch based on your recent comments and ping you :) > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.hpp, line 207 > > <https://reviews.apache.org/r/18403/diff/5/?file=522998#file522998line207> > > > > Why the change in return type? Changed in recent review. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.hpp, line 438 > > <https://reviews.apache.org/r/18403/diff/5/?file=522998#file522998line438> > > > > s/finalize/launched/ ? > > > > Also add a comment on what this method does. Changed to launchCompleted() in new patch. Will add a comment, thanks! :) > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.hpp, line 447 > > <https://reviews.apache.org/r/18403/diff/5/?file=522998#file522998line447> > > > > There are several places where executor state is checked and an > > appropriate action is taken. You should update them all with this state. I > > am not convinced that adding a new state is useful (or more trouble than > > its worth) here considering we have a future that tells whether it is > > launching or not. I am not opposed to strip that out again. I introduced it based on previous comments and on the fact, the executor need to be in a pre-registering state when the executor info is unknown. Otherwise, the launching state is kinda implicit (based on executor->info state). I'll go ahead and take a stab at removing LAUNCHING. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.hpp, line 460 > > <https://reviews.apache.org/r/18403/diff/5/?file=522998#file522998line460> > > > > +1 to Ian's comment. Changed in recent patch. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 2159 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line2159> > > > > Maybe we should call this executorLaunched? Good point - will change in upcoming patch. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 871 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line871> > > > > Sending 'true' here seems a bit hacky. Changed in recent review. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 916 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line916> > > > > Lets just pass executorId to this method instead of using this. Great point - thanks! > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 3154 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line3154> > > > > Why not just > > > > executor->future = ... ? Changed in recent patch. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 2342-2346 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line2342> > > > > Why are these CHECKs? Paranoia I suppose :) > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 3576-3588 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line3576> > > > > So the checkpointing of executor info is now being done after it is > > launched. So if a slave restarts before finalize() gets called there is no > > way to recover this info and inform the master. This is probably ok if the > > master stays up because the state will be reconciled when the slave > > re-registers. If the master also fails over then all bets are off and no > > one knows about the lost task/executor. This is unfortunate but I guess no > > different than if the slave restarted when the task was launched. Lets add > > a test for this. You bet - sounds like a great idea. I'll work on it. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 2206 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line2206> > > > > Is this only called here? If yes, lets not pull this out. Sure can do. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 1546-1547 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line1546> > > > > ditto. Why is this a check? Paranoia :) > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.hpp, line 440 > > <https://reviews.apache.org/r/18403/diff/5/?file=522998#file522998line440> > > > > Why do we need this? > > > > Why not just set Executor::id appropriately in the constructor? > > Good point! > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, line 785 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line785> > > > > This should be moved to __runTask() to ensure framework doesn't get > > removed before __runTask() gets called. Ah - good catch. Thanks! > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 860-871 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line860> > > > > Why not just always do onAny(_runTask) irrespective of the state? > > > > More importantly the current way it is done could lead to running tasks > > out of order. For example, if the first task is defered due to a launch, > > and the future is set while a second task is in the master's queue then the > > second task will be launched before the first. Good catch. Will change in upcoming patch. > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 1055-1065 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line1055> > > > > ditto. always enqueue on onAny. - Niklas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18403/#review37753 ----------------------------------------------------------- On March 19, 2014, 12:46 p.m., Niklas Nielsen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18403/ > ----------------------------------------------------------- > > (Updated March 19, 2014, 12:46 p.m.) > > > Review request for mesos, Ian Downes and Vinod Kone. > > > Bugs: MESOS-922 > https://issues.apache.org/jira/browse/MESOS-922 > > > Repository: mesos-git > > > Description > ------- > > This patch delegates the choice of executor to the containerizer by removing > executorInfo dependencies up until Containerizer::launch(). > Containerizer::launch() now returns a future to the executor info that is > being run and the slave creates the corresponding executor structure when > launch completes. > This means message handling from the running executor to the slave in the > interim where the executor structure has not created, need to be enqueued > until executor is ready. So far, registerExecutor() and reregisterExecutor() > has been split into two continuations to deal with this issue. > > > Diffs > ----- > > src/slave/containerizer/containerizer.hpp d9ae326 > src/slave/containerizer/mesos_containerizer.hpp ee1fd30 > src/slave/containerizer/mesos_containerizer.cpp c819c97 > src/slave/http.cpp 594032d > src/slave/slave.hpp 01b80df > src/slave/slave.cpp d8d3e0f > src/tests/containerizer.hpp 5686398 > src/tests/containerizer.cpp bfb9341 > > Diff: https://reviews.apache.org/r/18403/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Niklas Nielsen > >
