-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16238/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman and Vinod Kone.
Bugs: MESOS-844
https://issues.apache.org/jira/browse/MESOS-844
Repository: mesos-git
Description
-------
See MESOS-844. This uses the boot id to check whether the underlying machine
has rebooted. If so, we do not want to attempt recovery given the PIDs are
stale and machine information may have changed.
Diffs
-----
src/slave/paths.hpp 8f80b931a896cd7e317658147e864410558d5028
src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5
src/slave/state.cpp bf267b5ea23a7427c7cb05ab4e98af2e44d9cc43
src/tests/slave_recovery_tests.cpp 250083d380bf6d0bdba09abd239a9139a402008f
Diff: https://reviews.apache.org/r/16238/diff/
Testing
-------
Added an integration test, make check.
I need to clean up the boot time code to handle multiple slaves:
[ RUN ] SlaveRecoveryProcessIsolatorTest/0.MultipleSlaves
F1212 20:26:35.767004 273121280 state.cpp:69] CHECK_SOME(id): No boot time found
*** Check failure stack trace: ***
@ 0x10f35ea18 google::LogMessage::Flush()
@ 0x10f3600d1 google::LogMessageFatal::~LogMessageFatal()
@ 0x10e410a36 _CheckSome::~_CheckSome()
@ 0x10f0eb6a1 mesos::internal::slave::state::recover()
@ 0x10f13924f process::AsyncExecutorProcess::execute<>()
@ 0x10f11dd4f std::tr1::_Mem_fn<>::operator()()
@ 0x10f11ddcb std::tr1::_Bind<>::operator()<>()
@ 0x10f11de1c std::tr1::_Function_handler<>::_M_invoke()
@ 0x10f138fd3 process::internal::rdispatcher<>()
@ 0x10f11fe8b std::tr1::_Bind<>::operator()<>()
@ 0x10f11ffc8 std::tr1::_Function_handler<>::_M_invoke()
Re-registered executor on tw-172-25-24-157.office.twttr.net
@ 0x10f2850d0 process::ProcessManager::resume()
@ 0x10f285ad8 process::schedule()
@ 0x7fff8bd05772 _pthread_start
@ 0x7fff8bcf21a1 thread_start
Thanks,
Ben Mahler