-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/22526/#review46192
-----------------------------------------------------------


Great start. I didn't get into the tests at all, and only skimmed over the rest 
of the code, but here are a few comments from my first pass.

Since this is a huge diff (>1500 new/modified lines), how about splitting it up 
into multiple reviews/phases. Here are a few splits I can think of:
- Refactor launchTask to pull out common logic needed by resizeTask
- Add scheduler API (returns error in master, does nothing in slave)
- Add master logic and slave API (returns error in slave)
- Add slave logic
- Add master logic to verify/handleResizeTaskReply
- Add metrics
- Add tests


include/mesos/mesos.proto
<https://reviews.apache.org/r/22526/#comment81475>

    I worry that some code (ours or clients) may assume that a state > 
TASK_RUNNING is terminal. I know that hasn't been true since the creation of 
TASK_STAGING, but it's a general concern of mine. Perhaps it would have been 
better to have started the TERMINAL states at 10 or 100.



include/mesos/mesos.proto
<https://reviews.apache.org/r/22526/#comment81476>

    Why a boolean instead of an optional error message? Then you could do 
if(resizeTaskReply.has_error()) { LOG(ERROR) << resizeTaskReply.error(); } else 
{ //do something with resizeTaskReply }



include/mesos/mesos.proto
<https://reviews.apache.org/r/22526/#comment81477>

    Why do you need both old & new resources? In case multiple resize tasks 
come through at a time, or to handle resource changes during slave failovers or 
other tasks starting/completing?



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/22526/#comment81478>

    embedded



include/mesos/scheduler.hpp
<https://reviews.apache.org/r/22526/#comment81479>

    s/is succeeded or not/succeeded/



include/mesos/scheduler/scheduler.proto
<https://reviews.apache.org/r/22526/#comment81480>

    Where did '2' go?



src/master/master.hpp
<https://reviews.apache.org/r/22526/#comment81481>

    s/res/resources/



src/master/master.hpp
<https://reviews.apache.org/r/22526/#comment81483>

    s/message invalid/message is invalid/



src/master/master.cpp
<https://reviews.apache.org/r/22526/#comment81485>

    I see what you mean about all the duplicate checker code between here and 
launchTask. Perhaps refactoring launchTask would be a good precursor review.



src/master/master.cpp
<https://reviews.apache.org/r/22526/#comment81486>

    We generally prefer CopyFrom unless MergeFrom is explicitly needed.



src/master/master.cpp
<https://reviews.apache.org/r/22526/#comment81484>

    Do we still want to report the StatusUpdate to the framework if the master 
is already up to date?



src/scheduler/scheduler.cpp
<https://reviews.apache.org/r/22526/#comment81487>

    send(master.get(), message); ?



src/slave/slave.hpp
<https://reviews.apache.org/r/22526/#comment81488>

    Does _resizeTask need to be public too?



src/slave/slave.cpp
<https://reviews.apache.org/r/22526/#comment81489>

    Should you maybe check the future first, so you can send back an error 
ResizeTaskReply message rather than silently aborting the resize?
    Maybe some of these errors/returns should actually send an error 
ResizeTaskReply too?


- Adam B


On June 17, 2014, 11:23 a.m., Yifan Gu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/22526/
> -----------------------------------------------------------
> 
> (Updated June 17, 2014, 11:23 a.m.)
> 
> 
> Review request for mesos, Adam B, Benjamin Hindman, Ben Mahler, Niklas 
> Nielsen, and Vinod Kone.
> 
> 
> Bugs: MESOS-1279
>     https://issues.apache.org/jira/browse/MESOS-1279
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added resizeTask primitive.
> 
> This is just a proof of concept now. I will work on the unit test.
> Currently I added one state called "TASK_RESIZE" in state update, so that the 
> master/framework can get the resize result from the slave.
> I put the result in the 'data' field of the TaskStatus.
> 
> And I feel that I copy-pasted a lot of checkers, which is kind mess, I think 
> they should be put into a separate module later.
> 
> Any question or suggestion is highly welcome! Thanks!
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 102289b 
>   include/mesos/scheduler.hpp d224945 
>   include/mesos/scheduler/scheduler.proto 6ab5089 
>   src/Makefile.am c91b438 
>   src/common/protobuf_utils.hpp 12ff00a 
>   src/master/master.hpp 7a12185 
>   src/master/master.cpp 4a01b1a 
>   src/messages/messages.proto 8aecc8b 
>   src/sched/sched.cpp 6e14f1c 
>   src/scheduler/scheduler.cpp 4ae188e 
>   src/slave/slave.hpp 34687e5 
>   src/slave/slave.cpp 643c088 
>   src/tests/resize_task_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/22526/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Yifan Gu
> 
>

Reply via email to