> On March 1, 2013, 12:15 a.m., Ben Mahler wrote: > > src/slave/cgroups_isolation_module.cpp, line 505 > > <https://reviews.apache.org/r/8535/diff/8/?file=261376#file261376line505> > > > > Can you make this comment a little clearer? My suggestion: > > > > Before > > // Checkpoint the forked pid, if necessary. > > // We do this here instead of in the parent process, because if we > > do it > > // in the slave, the slave might die immediately after it forks but > > before > > // it writes the the pid to disk. This will result in an orphaned > > executor > > // process that the slave has no idea about when doing recovery. > > > > s/the the/the > > > > After: > > // Checkpoint the forked pid, if necessary. > > // The checkpointing must be done in the forked process, because > > // the slave process can die immediately after the isolation module > > forks but before it would have > > // a chance to write the pid to disk. That would result in an > > orphaned > > // executor process unknown to the slave when doing recovery. > >
done. thanks. > On March 1, 2013, 12:15 a.m., Ben Mahler wrote: > > src/slave/lxc_isolation_module.cpp, line 124 > > <https://reviews.apache.org/r/8535/diff/8/?file=261380#file261380line124> > > > > Ditto on my earlier review, I'd rather this send TASK_FAILED or > > TASK_LOST with this message. Or at least, please add a TODO that this is > > not implemented because we're deprecating the lxc isolation module? > > > > Seems pretty bad that a framework can cause slaves to exit at will! Thansk for the push. Decided to make this a CHECK instead. Also added logic in slave/main.cpp to stop users from starting a slave w/ lxc and checkpointing. I think failing fast is the right semantics. > On March 1, 2013, 12:15 a.m., Ben Mahler wrote: > > src/slave/slave.hpp, line 137 > > <https://reviews.apache.org/r/8535/diff/8/?file=261384#file261384line137> > > > > Why did you move these in this review, as opposed to the review where > > you introduced them? moved them back. > On March 1, 2013, 12:15 a.m., Ben Mahler wrote: > > src/slave/slave.hpp, line 196 > > <https://reviews.apache.org/r/8535/diff/8/?file=261384#file261384line196> > > > > s/This callback is called/This continuation runs I don't like "this continuation runs". Instead, I went with s/callback// > On March 1, 2013, 12:15 a.m., Ben Mahler wrote: > > src/slave/slave.cpp, line 915 > > <https://reviews.apache.org/r/8535/diff/8/?file=261385#file261385line915> > > > > getLibprocessPIDPath sounds ambiguous, maybe this would be better named > > as getExecutorDriverPIDPath? We chose "libprocess" in the pid name, to differentiate with the os pid ('forkedpid'). - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8535/#review17210 ----------------------------------------------------------- On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8535/ > ----------------------------------------------------------- > > (Updated Feb. 23, 2013, 8:12 a.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Description > ------- > > Checkpoints slave id, framework pid, executor pids (libprocess, execed, > forked) and task info. > > > Diffs > ----- > > include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 > src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 > src/slave/cgroups_isolation_module.hpp > 669efa14ba2603764aa68ae19a44e79dbfdec192 > src/slave/cgroups_isolation_module.cpp > a2eba6f96f5d8a4b1257571aa29e37c5682aab8d > src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd > src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 > src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 > src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c > src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a > src/slave/process_based_isolation_module.hpp > f1817192582e3646f8dcf17934ba7998829e8fd6 > src/slave/process_based_isolation_module.cpp > 12a579cba56cd3dac384bc7919b0d5537b0e429d > src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 > src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd > src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 > src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b > src/tests/protobuf_io_tests.cpp 5bd1e676761ac35d30b9491fb397830c8a008795 > src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 > src/tests/status_update_manager_tests.cpp PRE-CREATION > src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 > > Diff: https://reviews.apache.org/r/8535/diff/ > > > Testing > ------- > > make check > > > Thanks, > > Vinod Kone > >
