> On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > I've only gotten halfway through ... but there is a bunch here already. I'd > > like to break this up into at least four patches. (1) The utils stuff that > > was added. (2) The master changes. (3) The slave::path namespace stuff. (3) > > The status update manager API + implementation (but not the slave using it > > yet). And (4) the slave using each of these components, and the executor > > changes that are included. > > > > These comments are across all of those patches, but I'll make future passes > > on each of those components. > > Vinod Kone wrote: > addressed the comments for the utils.hpp part. Will send a review for > utils.hpp and protobuf_utils.hpp (forgot to include it in this review) > shortly.
patch coming for path refactoring shortly. > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.hpp, line 58 > > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line58> > > > > s/follows/follows: done > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.hpp, line 66 > > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line66> > > > > What is the framework PID? How is that different than the Executor PID > > mentioned below? we need to store framework pid because, when an executor re-registers we need its framework's PID to re-create a new Framework() object. > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.cpp, line 218 > > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line218> > > > > Isn't the default workd_dir mentioned in the addOption above > > /tmp/mesos? If so, "/tmp" should be "/tmp/mesos" here, and we probably > > don't want "/tmp/mesos/mesos". Just clean this up so people understand > > expectations (i.e., should work_dir be a path including the "mesos" > > directory, or will we create that ourselves). I agree its a bit confusing. The reason I did it this way is to avoid dumping work/meta directories directly under '/tmp' (for eg. when an user specifies work_dir=/tmp). I'm indifferent on how we want to do this. I will revert back to 'workRootDir = conf.get("work_dir", "/tmp/mesos") + "/work"' for now; > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.hpp, line 56 > > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line56> > > > > I'd like to stick all of this stuff in it's own file and commit this on > > it's own (with integration in Slave as applicable and tests). refactored out path stuff into slave/path.hpp > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.hpp, lines 344-345 > > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line344> > > > > What's the difference between "work" and "work root"? Or "meta" and > > "meta root"? workRootDir denotes the root directory (conf['work_dir'/mesos/work]) where work directories of a slave are stored. it's not exactly a slave's working directory (that would be workRootDir/slaves/slaveId). i needed the workRootDir for the path layout stuff. same goes with metaRootDir > On 2012-04-25 22:11:01, Benjamin Hindman wrote: > > src/slave/slave.cpp, line 2289 > > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line2289> > > > > Pass these into writeFrameworkPID, no need to make this an instance > > function. More importantly, you should do this for each of these > > writers/readers. moved these write/read functions into path.hpp - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/4462/#review7232 ----------------------------------------------------------- On 2012-04-19 16:53:07, Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/4462/ > ----------------------------------------------------------- > > (Updated 2012-04-19 16:53:07) > > > Review request for mesos, Benjamin Hindman and John Sirois. > > > Summary > ------- > > Sorry for the huge CL! > > Slave restarts now supports recovery! > --> Non-disruptive restart means running tasks are not lost > --> Re-connects with live executors > --> Checkpoints and reliably sends status updates > --> Ability to kill executors if the slave upgrade is incompatible with > running executors > > > This addresses bug mesos-110. > https://issues.apache.org/jira/browse/mesos-110 > > > Diffs > ----- > > src/Makefile.am d5edaa2 > src/common/hashset.hpp 1feb610 > src/common/utils.hpp 1d81e21 > src/exec/exec.cpp e8db407 > src/launcher/launcher.cpp a141b9a > src/local/local.hpp 55f9eaf > src/local/local.cpp affe432 > src/master/master.cpp 4dc9ee0 > src/messages/messages.proto 87e1548 > src/sched/sched.cpp dcadb10 > src/scripts/killtree.sh bceae9d > src/slave/constants.hpp f0c8679 > src/slave/http.cpp 19c48a0 > src/slave/isolation_module.hpp c896908 > src/slave/lxc_isolation_module.hpp b7beefe > src/slave/lxc_isolation_module.cpp 66a2a89 > src/slave/main.cpp 85cba25 > src/slave/process_based_isolation_module.hpp f6f9554 > src/slave/process_based_isolation_module.cpp 2b37d42 > src/slave/slave.hpp 279bc7b > src/slave/slave.cpp 3358ec4 > src/slave/statusupdates_manager.hpp PRE-CREATION > src/slave/statusupdates_manager.cpp PRE-CREATION > src/tests/external_tests.cpp d1b20e4 > src/tests/fault_tolerance_tests.cpp 6772daf > src/tests/slave_restart_tests.cpp PRE-CREATION > src/tests/utils.hpp e81ec82 > > Diff: https://reviews.apache.org/r/4462/diff > > > Testing > ------- > > make check. > > Note that only the new test in tests/slave_restart_tests.cpp engages in > recovery! > > Recovery is disabled for old tests (though they still checkpoint relevant > info!) > > > Thanks, > > Vinod > >
