> On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/tests/protobuf_io_tests.cpp, line 57 > > <https://reviews.apache.org/r/8535/diff/1/?file=237229#file237229line57> > > > > Did the compiler want/need these?
yes. because we have two different namespaces mesos::internal::protobuf (common/protobuf_utils.hpp) and protobuf (stout/protobuf.hpp) > On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/slave/slave.cpp, line 500 > > <https://reviews.apache.org/r/8535/diff/1/?file=237228#file237228line500> > > > > I wonder if the flag should really be called 'checkpoint' and the > > 'recover' flag should be whether or not to recover from an existing > > checkpoint? As discussed offline, we are going to have two flags: 'checkpoint' (boolean) and 'recover' (string). I will add the 'recover' flag in part 5. > On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/slave/slave.cpp, lines 499-500 > > <https://reviews.apache.org/r/8535/diff/1/?file=237228#file237228line499> > > > > Hmm, I'm not sure how I feel about these semantics. I think I'd prefer > > to see the frameworks tasks keep going lost or failed with a message > > "Couldn't launch task because framework expects checkpointing but slave > > doesn't support it ..." rather than the slave being killed. Thoughts? Agreed. Fixed. > On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/slave/slave.cpp, line 600 > > <https://reviews.apache.org/r/8535/diff/1/?file=237228#file237228line600> > > > > Could we deal with some of these failures by just killing the > > framework/task? As discussed offline, failing fast when there are file i/o errors keeps the code clean and simple. > On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/slave/slave.cpp, line 1312 > > <https://reviews.apache.org/r/8535/diff/1/?file=237228#file237228line1312> > > > > Not following this ... removed this in favor of just checking for error. I added a TODO in one of the earlier reviews to fix os::write/read and protobuf::write/read to not return bools. > On Dec. 12, 2012, 6:51 p.m., Benjamin Hindman wrote: > > src/slave/slave.cpp, lines 1330-1331 > > <https://reviews.apache.org/r/8535/diff/1/?file=237228#file237228line1330> > > > > Why a CHECK above and just a LOG here? good catch. reverted everything to CHECK - Vinod ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/8535/#review14360 ----------------------------------------------------------- On Dec. 12, 2012, 8:58 a.m., Vinod Kone wrote: > > ----------------------------------------------------------- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/8535/ > ----------------------------------------------------------- > > (Updated Dec. 12, 2012, 8:58 a.m.) > > > Review request for mesos, Benjamin Hindman and Ben Mahler. > > > Description > ------- > > Checkpoints slave id, framework pid, executor pids (libprocess, execed, > forked) and task info. > > > Diffs > ----- > > src/exec/exec.cpp 11cb64dd3f6d3992072582f2b6d6e6d9220eb48d > src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f > src/slave/cgroups_isolation_module.hpp > 669efa14ba2603764aa68ae19a44e79dbfdec192 > src/slave/cgroups_isolation_module.cpp > 0f2975d1adf874dba4d0a539eb5c99233cef6e6b > src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd > src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 > src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 > src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c > src/slave/process_based_isolation_module.hpp > f1817192582e3646f8dcf17934ba7998829e8fd6 > src/slave/process_based_isolation_module.cpp > ede080069d814f410662f759806d1d5b260e8569 > src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba > src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e > src/tests/protobuf_io_tests.cpp 979efac0f28c3d361f5c347f15933d89d9356bc9 > src/tests/status_update_manager_tests.cpp PRE-CREATION > src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 > > Diff: https://reviews.apache.org/r/8535/diff/ > > > Testing > ------- > > make check > > Will write new tests as part of Part 5, when I write the analogous > read/replay functions of the checkpoint functions. > > > Thanks, > > Vinod Kone > >
