> On June 19, 2014, 7:42 a.m., Adam B wrote: > > Great start. I didn't get into the tests at all, and only skimmed over the > > rest of the code, but here are a few comments from my first pass. > > > > Since this is a huge diff (>1500 new/modified lines), how about splitting > > it up into multiple reviews/phases. Here are a few splits I can think of: > > - Refactor launchTask to pull out common logic needed by resizeTask > > - Add scheduler API (returns error in master, does nothing in slave) > > - Add master logic and slave API (returns error in slave) > > - Add slave logic > > - Add master logic to verify/handleResizeTaskReply > > - Add metrics > > - Add tests
Great suggestion! Will do! Thanks for reviewing!! > On June 19, 2014, 7:42 a.m., Adam B wrote: > > include/mesos/mesos.proto, line 497 > > <https://reviews.apache.org/r/22526/diff/11/?file=611738#file611738line497> > > > > I worry that some code (ours or clients) may assume that a state > > > TASK_RUNNING is terminal. I know that hasn't been true since the creation > > of TASK_STAGING, but it's a general concern of mine. Perhaps it would have > > been better to have started the TERMINAL states at 10 or 100. hmm, I checked "inline bool isTerminalState(const TaskState& state)" in src/common/protobuf_utils.cpp, it checks the state individually, so I think if we use that function else where to check the TERMINAL state, then it would be fine. > On June 19, 2014, 7:42 a.m., Adam B wrote: > > include/mesos/mesos.proto, line 526 > > <https://reviews.apache.org/r/22526/diff/11/?file=611738#file611738line526> > > > > Why a boolean instead of an optional error message? Then you could do > > if(resizeTaskReply.has_error()) { LOG(ERROR) << resizeTaskReply.error(); } > > else { //do something with resizeTaskReply } Cool! That will be sweet! Thanks! > On June 19, 2014, 7:42 a.m., Adam B wrote: > > include/mesos/scheduler/scheduler.proto, lines 183-186 > > <https://reviews.apache.org/r/22526/diff/11/?file=611740#file611740line183> > > > > Where did '2' go? somewhere.. > On June 19, 2014, 7:42 a.m., Adam B wrote: > > src/master/master.cpp, lines 3368-3370 > > <https://reviews.apache.org/r/22526/diff/11/?file=611744#file611744line3368> > > > > Do we still want to report the StatusUpdate to the framework if the > > master is already up to date? Yes we do. if the function return Nothing(), then the message will be forwarded. > On June 19, 2014, 7:42 a.m., Adam B wrote: > > src/slave/slave.cpp, line 1384 > > <https://reviews.apache.org/r/22526/diff/11/?file=611749#file611749line1384> > > > > Should you maybe check the future first, so you can send back an error > > ResizeTaskReply message rather than silently aborting the resize? > > Maybe some of these errors/returns should actually send an error > > ResizeTaskReply too? True. I think if the slave is terminating, it should send back the reply. > On June 19, 2014, 7:42 a.m., Adam B wrote: > > src/scheduler/scheduler.cpp, line 371 > > <https://reviews.apache.org/r/22526/diff/11/?file=611747#file611747line371> > > > > send(master.get(), message); ? lol.. - Yifan ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/22526/#review46192 ----------------------------------------------------------- On June 17, 2014, 6:23 p.m., Yifan Gu wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/22526/ > ----------------------------------------------------------- > > (Updated June 17, 2014, 6:23 p.m.) > > > Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas > Nielsen, and Vinod Kone. > > > Bugs: MESOS-1279 > https://issues.apache.org/jira/browse/MESOS-1279 > > > Repository: mesos-git > > > Description > ------- > > Added resizeTask primitive. > > This is just a proof of concept now. I will work on the unit test. > Currently I added one state called "TASK_RESIZE" in state update, so that the > master/framework can get the resize result from the slave. > I put the result in the 'data' field of the TaskStatus. > > And I feel that I copy-pasted a lot of checkers, which is kind mess, I think > they should be put into a separate module later. > > Any question or suggestion is highly welcome! Thanks! > > > Diffs > ----- > > include/mesos/mesos.proto 102289b > include/mesos/scheduler.hpp d224945 > include/mesos/scheduler/scheduler.proto 6ab5089 > src/Makefile.am c91b438 > src/common/protobuf_utils.hpp 12ff00a > src/master/master.hpp 7a12185 > src/master/master.cpp 4a01b1a > src/messages/messages.proto 8aecc8b > src/sched/sched.cpp 6e14f1c > src/scheduler/scheduler.cpp 4ae188e > src/slave/slave.hpp 34687e5 > src/slave/slave.cpp 643c088 > src/tests/resize_task_tests.cpp PRE-CREATION > > Diff: https://reviews.apache.org/r/22526/diff/ > > > Testing > ------- > > make check. > > > Thanks, > > Yifan Gu > >
