----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18403/#review37753 -----------------------------------------------------------
Can we add some tests around the new refactor (e.g., race conditions mentioned in the review)? That would make us confident that we are not regressing here, because this is a significant change. src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69495> Why the change in return type? src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69408> s/finalize/launched/ ? Also add a comment on what this method does. src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69403> Why do we need this? Why not just set Executor::id appropriately in the constructor? src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69451> There are several places where executor state is checked and an appropriate action is taken. You should update them all with this state. I am not convinced that adding a new state is useful (or more trouble than its worth) here considering we have a future that tells whether it is launching or not. src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69405> +1 to Ian's comment. src/slave/slave.hpp <https://reviews.apache.org/r/18403/#comment69406> kill this. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69417> Why is this a CHECK? What guarantees that the executor info is set when we are here? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69429> This should be moved to __runTask() to ensure framework doesn't get removed before __runTask() gets called. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69423> Why not just always do onAny(_runTask) irrespective of the state? More importantly the current way it is done could lead to running tasks out of order. For example, if the first task is defered due to a launch, and the future is set while a second task is in the master's queue then the second task will be launched before the first. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69425> Sending 'true' here seems a bit hacky. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69432> What about the slave state checks similar to what we have in _runTask()? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69441> Lets just pass executorId to this method instead of using this. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69442> ditto. always enqueue on onAny. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69446> Why is this a CHECK? What guarantees a framework will not be removed? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69447> ditto. why is this a check? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69453> Indentation. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69454> ditto on onAny. Also, how is this possible? An executor is still launching but it is registering? Do we want to enforce an order here? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69460> You should add checks for slave states here. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69455> ditto. Why is this a check? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69456> Lets add a CHECK here :) src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69461> Indentation. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69462> ditto. enque on onAny. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69469> Include the executor and framework ids in log messages like these. Here and everywhere else. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69463> Check for slave states. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69496> Maybe we should call this executorLaunched? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69471> Is this only called here? If yes, lets not pull this out. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69472> Add a CHECK or just use executorInfo. src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69474> Why are these CHECKs? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69479> Why not just executor->future = ... ? src/slave/slave.cpp <https://reviews.apache.org/r/18403/#comment69485> So the checkpointing of executor info is now being done after it is launched. So if a slave restarts before finalize() gets called there is no way to recover this info and inform the master. This is probably ok if the master stays up because the state will be reconciled when the slave re-registers. If the master also fails over then all bets are off and no one knows about the lost task/executor. This is unfortunate but I guess no different than if the slave restarted when the task was launched. Lets add a test for this. - Vinod Kone On March 19, 2014, 7:31 p.m., Niklas Nielsen wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/18403/ > ----------------------------------------------------------- > > (Updated March 19, 2014, 7:31 p.m.) > > > Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Ian Downes, > Till Toenshoff, and Vinod Kone. > > > Bugs: MESOS-922 > https://issues.apache.org/jira/browse/MESOS-922 > > > Repository: mesos-git > > > Description > ------- > > This patch delegates the choice of executor to the containerizer by removing > executorInfo dependencies up until Containerizer::launch(). > Containerizer::launch() now returns a future to the executor info that is > being run and the slave creates the corresponding executor structure when > launch completes. > This means message handling from the running executor to the slave in the > interim where the executor structure has not created, need to be enqueued > until executor is ready. So far, registerExecutor() and reregisterExecutor() > has been split into two continuations to deal with this issue. > > > Diffs > ----- > > src/slave/containerizer/containerizer.hpp d9ae326 > src/slave/containerizer/mesos_containerizer.hpp ee1fd30 > src/slave/containerizer/mesos_containerizer.cpp c819c97 > src/slave/http.cpp 594032d > src/slave/slave.hpp 01b80df > src/slave/slave.cpp d8d3e0f > src/tests/containerizer.hpp 5686398 > src/tests/containerizer.cpp bfb9341 > > Diff: https://reviews.apache.org/r/18403/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Niklas Nielsen > >
