> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > I've only gotten halfway through ... but there is a bunch here already. I'd 
> > like to break this up into at least four patches. (1) The utils stuff that 
> > was added. (2) The master changes. (3) The slave::path namespace stuff. (3) 
> > The status update manager API + implementation (but not the slave using it 
> > yet). And (4) the slave using each of these components, and the executor 
> > changes that are included.
> > 
> > These comments are across all of those patches, but I'll make future passes 
> > on each of those components.
> 
> Vinod Kone wrote:
>     addressed the comments for the utils.hpp part. Will send a review for 
> utils.hpp and protobuf_utils.hpp (forgot to include it in this review) 
> shortly.

patch coming for path refactoring shortly.


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 58
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line58>
> >
> >     s/follows/follows:

done


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 66
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line66>
> >
> >     What is the framework PID? How is that different than the Executor PID 
> > mentioned below?

we need to store framework pid because, when an executor re-registers we need 
its framework's PID to re-create a new Framework() object. 


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 218
> > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line218>
> >
> >     Isn't the default workd_dir mentioned in the addOption above 
> > /tmp/mesos? If so, "/tmp" should be "/tmp/mesos" here, and we probably 
> > don't want "/tmp/mesos/mesos". Just clean this up so people understand 
> > expectations (i.e., should work_dir be a path including the "mesos" 
> > directory, or will we create that ourselves).

I agree its a bit confusing. The reason I did it this way is to avoid dumping 
work/meta directories directly under '/tmp' (for eg. when an user specifies 
work_dir=/tmp).
 
I'm indifferent on how we want to do this. I will revert back to 'workRootDir = 
conf.get("work_dir", "/tmp/mesos") + "/work"' for now;


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 56
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line56>
> >
> >     I'd like to stick all of this stuff in it's own file and commit this on 
> > it's own (with integration in Slave as applicable and tests).

refactored out path stuff into slave/path.hpp 


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, lines 344-345
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line344>
> >
> >     What's the difference between "work" and "work root"? Or "meta" and 
> > "meta root"?

workRootDir denotes the root directory (conf['work_dir'/mesos/work]) where work 
directories of a slave are stored. it's not exactly a slave's working directory 
(that would be workRootDir/slaves/slaveId).

i needed the workRootDir for the path layout stuff.

same goes with metaRootDir


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 2289
> > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line2289>
> >
> >     Pass these into writeFrameworkPID, no need to make this an instance 
> > function. More importantly, you should do this for each of these 
> > writers/readers.

moved these write/read functions into path.hpp


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/#review7232
-----------------------------------------------------------


On 2012-04-19 16:53:07, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4462/
> -----------------------------------------------------------
> 
> (Updated 2012-04-19 16:53:07)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> Sorry for the huge  CL!
> 
> Slave restarts now supports recovery!
> --> Non-disruptive restart means running tasks are not lost
> --> Re-connects with live executors
> --> Checkpoints and reliably sends status updates
> --> Ability to kill executors if the slave upgrade is incompatible with 
> running executors
> 
> 
> This addresses bug mesos-110.
>     https://issues.apache.org/jira/browse/mesos-110
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am d5edaa2 
>   src/common/hashset.hpp 1feb610 
>   src/common/utils.hpp 1d81e21 
>   src/exec/exec.cpp e8db407 
>   src/launcher/launcher.cpp a141b9a 
>   src/local/local.hpp 55f9eaf 
>   src/local/local.cpp affe432 
>   src/master/master.cpp 4dc9ee0 
>   src/messages/messages.proto 87e1548 
>   src/sched/sched.cpp dcadb10 
>   src/scripts/killtree.sh bceae9d 
>   src/slave/constants.hpp f0c8679 
>   src/slave/http.cpp 19c48a0 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp 66a2a89 
>   src/slave/main.cpp 85cba25 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 2b37d42 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
>   src/slave/statusupdates_manager.hpp PRE-CREATION 
>   src/slave/statusupdates_manager.cpp PRE-CREATION 
>   src/tests/external_tests.cpp d1b20e4 
>   src/tests/fault_tolerance_tests.cpp 6772daf 
>   src/tests/slave_restart_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp e81ec82 
> 
> Diff: https://reviews.apache.org/r/4462/diff
> 
> 
> Testing
> -------
> 
> make check.
> 
> Note that only the new test in tests/slave_restart_tests.cpp  engages in 
> recovery!
> 
> Recovery is disabled for old tests (though they still checkpoint relevant 
> info!)
> 
> 
> Thanks,
> 
> Vinod
> 
>

Reply via email to