----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8535/#review14456 -----------------------------------------------------------
src/exec/exec.cpp <https://reviews.apache.org/r/8535/#comment30767> Can you add these for getpid? #include <unistd.h> #include <sys/types.h> src/messages/messages.proto <https://reviews.apache.org/r/8535/#comment30768> Again, I'm scared about adding required fields. src/slave/cgroups_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30769> I know it's the same but given what you're checking I think it's clearer to leave the logic to read literally as your message reads: !(checkpoint && path.isNone()) or if (checkpoint) { CHECK_SOME(path) << "Blah"; } src/slave/cgroups_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30771> Use CHECK_SOME (you don't need to store the write result if you don't like). Change the message to use the 'failed' format. CHECK_SOME(os::write(path.get(), stringify(getpid()))) << "Failed to checkpoint forked pid " << add pid to message? src/slave/flags.hpp <https://reviews.apache.org/r/8535/#comment30772> s/Enable checkpointing/checkpoint src/slave/flags.hpp <https://reviews.apache.org/r/8535/#comment30776> Can you be more explicit about the fact that checkpoint enables slave recovery? Or is there another flag for that? Can you mention that we checkpoint to disk? (there's a cost of disk io) src/slave/lxc_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30777> ditto on the check src/slave/lxc_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30778> ditto on my other comments with check_some and the check message src/slave/process_based_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30779> ditto on the check src/slave/process_based_isolation_module.cpp <https://reviews.apache.org/r/8535/#comment30780> ditto on the check_some and the check message src/slave/slave.hpp <https://reviews.apache.org/r/8535/#comment30781> With checkpointing enabled, will the slave id stay the same across recovered rolling of the slave? src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30782> Do we end these messages with periods? src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30786> Can you add a relevant message? src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30783> might as well not store result, if not storing the mkdir result either: CHECK_SOME(::protobuf::write(path, id)) << "Error checkpointing slave id " << id; src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30787> ditto for adding message src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30784> ditto for not storing result if you don't like src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30788> ditto for adding message src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30789> ditto for not storing if you don't like src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30790> ditto for not storing src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30791> ditto for adding message src/slave/slave.cpp <https://reviews.apache.org/r/8535/#comment30793> ditto for not storing - Ben Mahler On Dec. 12, 2012, 11:16 p.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8535/ > ----------------------------------------------------------- > > (Updated Dec. 12, 2012, 11:16 p.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Description > ------- > > Checkpoints slave id, framework pid, executor pids (libprocess, execed, > forked) and task info. > > > Diffs > ----- > > src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 > src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f > src/slave/cgroups_isolation_module.hpp > 669efa14ba2603764aa68ae19a44e79dbfdec192 > src/slave/cgroups_isolation_module.cpp > 0f2975d1adf874dba4d0a539eb5c99233cef6e6b > src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd > src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 > src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 > src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c > src/slave/process_based_isolation_module.hpp > f1817192582e3646f8dcf17934ba7998829e8fd6 > src/slave/process_based_isolation_module.cpp > 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 > src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 > src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 > src/tests/protobuf_io_tests.cpp 979efac0f28c3d361f5c347f15933d89d9356bc9 > src/tests/status_update_manager_tests.cpp PRE-CREATION > src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 > > Diff: https://reviews.apache.org/r/8535/diff/ > > > Testing > ------- > > make check > > Will write new tests as part of Part 5, when I write the analogous > read/replay functions of the checkpoint functions. > > > Thanks, > > Vinod Kone > >
