> On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 3576-3588 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line3576> > > > > So the checkpointing of executor info is now being done after it is > > launched. So if a slave restarts before finalize() gets called there is no > > way to recover this info and inform the master. This is probably ok if the > > master stays up because the state will be reconciled when the slave > > re-registers. If the master also fails over then all bets are off and no > > one knows about the lost task/executor. This is unfortunate but I guess no > > different than if the slave restarted when the task was launched. Lets add > > a test for this. > > Niklas Nielsen wrote: > You bet - sounds like a great idea. I'll work on it. > > Niklas Nielsen wrote: > So just to clarify (and to get some input on how the test would work): > This would involve imitating a launch which doesn't get to return an executor > info in time before a fail-over happens?
Moving this to r19795 > On March 19, 2014, 12:45 p.m., Vinod Kone wrote: > > src/slave/slave.cpp, lines 1081-1083 > > <https://reviews.apache.org/r/18403/diff/5/?file=522999#file522999line1081> > > > > Why is this a CHECK? What guarantees a framework will not be removed? > > Niklas Nielsen wrote: > CHECK Should indeed be dropped. I don't think it is much different from > the entry to killTask(); should we just ignore in _killTask with a warning > instead (like in killTask)? > > Niklas Nielsen wrote: > Ping :) Moving this to r19795 - Niklas ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18403/#review37753 ----------------------------------------------------------- On April 8, 2014, 2:19 p.m., Niklas Nielsen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18403/ > ----------------------------------------------------------- > > (Updated April 8, 2014, 2:19 p.m.) > > > Review request for mesos, Ian Downes and Vinod Kone. > > > Bugs: MESOS-922 > https://issues.apache.org/jira/browse/MESOS-922 > > > Repository: mesos-git > > > Description > ------- > > This patch delegates the choice of executor to the containerizer by removing > executorInfo dependencies up until Containerizer::launch(). > Containerizer::launch() now returns a future to the executor info that is > being run and the slave creates the corresponding executor structure when > launch completes. > This means message handling from the running executor to the slave in the > interim where the executor structure has not created, need to be enqueued > until executor is ready. > > > Diffs > ----- > > src/slave/containerizer/containerizer.hpp d9ae326 > src/slave/containerizer/mesos_containerizer.hpp ee1fd30 > src/slave/containerizer/mesos_containerizer.cpp c819c97 > src/slave/slave.hpp 15e23ce > src/slave/slave.cpp a356f5f > src/tests/containerizer.hpp a9f1531 > src/tests/containerizer.cpp bfb9341 > > Diff: https://reviews.apache.org/r/18403/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Niklas Nielsen > >
