----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22526/#review46192 -----------------------------------------------------------
Great start. I didn't get into the tests at all, and only skimmed over the rest of the code, but here are a few comments from my first pass. Since this is a huge diff (>1500 new/modified lines), how about splitting it up into multiple reviews/phases. Here are a few splits I can think of: - Refactor launchTask to pull out common logic needed by resizeTask - Add scheduler API (returns error in master, does nothing in slave) - Add master logic and slave API (returns error in slave) - Add slave logic - Add master logic to verify/handleResizeTaskReply - Add metrics - Add tests include/mesos/mesos.proto <https://reviews.apache.org/r/22526/#comment81475> I worry that some code (ours or clients) may assume that a state > TASK_RUNNING is terminal. I know that hasn't been true since the creation of TASK_STAGING, but it's a general concern of mine. Perhaps it would have been better to have started the TERMINAL states at 10 or 100. include/mesos/mesos.proto <https://reviews.apache.org/r/22526/#comment81476> Why a boolean instead of an optional error message? Then you could do if(resizeTaskReply.has_error()) { LOG(ERROR) << resizeTaskReply.error(); } else { //do something with resizeTaskReply } include/mesos/mesos.proto <https://reviews.apache.org/r/22526/#comment81477> Why do you need both old & new resources? In case multiple resize tasks come through at a time, or to handle resource changes during slave failovers or other tasks starting/completing? include/mesos/scheduler.hpp <https://reviews.apache.org/r/22526/#comment81478> embedded include/mesos/scheduler.hpp <https://reviews.apache.org/r/22526/#comment81479> s/is succeeded or not/succeeded/ include/mesos/scheduler/scheduler.proto <https://reviews.apache.org/r/22526/#comment81480> Where did '2' go? src/master/master.hpp <https://reviews.apache.org/r/22526/#comment81481> s/res/resources/ src/master/master.hpp <https://reviews.apache.org/r/22526/#comment81483> s/message invalid/message is invalid/ src/master/master.cpp <https://reviews.apache.org/r/22526/#comment81485> I see what you mean about all the duplicate checker code between here and launchTask. Perhaps refactoring launchTask would be a good precursor review. src/master/master.cpp <https://reviews.apache.org/r/22526/#comment81486> We generally prefer CopyFrom unless MergeFrom is explicitly needed. src/master/master.cpp <https://reviews.apache.org/r/22526/#comment81484> Do we still want to report the StatusUpdate to the framework if the master is already up to date? src/scheduler/scheduler.cpp <https://reviews.apache.org/r/22526/#comment81487> send(master.get(), message); ? src/slave/slave.hpp <https://reviews.apache.org/r/22526/#comment81488> Does _resizeTask need to be public too? src/slave/slave.cpp <https://reviews.apache.org/r/22526/#comment81489> Should you maybe check the future first, so you can send back an error ResizeTaskReply message rather than silently aborting the resize? Maybe some of these errors/returns should actually send an error ResizeTaskReply too? - Adam B On June 17, 2014, 11:23 a.m., Yifan Gu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22526/ > ----------------------------------------------------------- > > (Updated June 17, 2014, 11:23 a.m.) > > > Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas > Nielsen, and Vinod Kone. > > > Bugs: MESOS-1279 > https://issues.apache.org/jira/browse/MESOS-1279 > > > Repository: mesos-git > > > Description > ------- > > Added resizeTask primitive. > > This is just a proof of concept now. I will work on the unit test. > Currently I added one state called "TASK_RESIZE" in state update, so that the > master/framework can get the resize result from the slave. > I put the result in the 'data' field of the TaskStatus. > > And I feel that I copy-pasted a lot of checkers, which is kind mess, I think > they should be put into a separate module later. > > Any question or suggestion is highly welcome! Thanks! > > > Diffs > ----- > > include/mesos/mesos.proto 102289b > include/mesos/scheduler.hpp d224945 > include/mesos/scheduler/scheduler.proto 6ab5089 > src/Makefile.am c91b438 > src/common/protobuf_utils.hpp 12ff00a > src/master/master.hpp 7a12185 > src/master/master.cpp 4a01b1a > src/messages/messages.proto 8aecc8b > src/sched/sched.cpp 6e14f1c > src/scheduler/scheduler.cpp 4ae188e > src/slave/slave.hpp 34687e5 > src/slave/slave.cpp 643c088 > src/tests/resize_task_tests.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/22526/diff/ > > > Testing > ------- > > make check. > > > Thanks, > > Yifan Gu > >
