[
https://issues.apache.org/jira/browse/MESOS-676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benjamin Mahler resolved MESOS-676.
-----------------------------------
Resolution: Won't Fix
I'm thinking I will leave this as is, since if the master is sending delayed
re-registered messages we should likely just have the slave die for safety,
rather than try to work around it without MESOS-677.
I added a disclaimer on that piece of code here:
https://reviews.apache.org/r/13954/
> Slave::reregistered LOG(FATAL)s due to being in RECOVERING state.
> -----------------------------------------------------------------
>
> Key: MESOS-676
> URL: https://issues.apache.org/jira/browse/MESOS-676
> Project: Mesos
> Issue Type: Bug
> Reporter: Benjamin Mahler
> Assignee: Benjamin Mahler
> Fix For: 0.14.0
>
>
> void Slave::reregistered(const SlaveID& slaveId)
> {
> switch(state) {
> case DISCONNECTED:
> LOG(INFO) << "Re-registered with master " << master;
> state = RUNNING;
> if (!(info.id() == slaveId)) {
> EXIT(1) << "Re-registered but got wrong id: " << slaveId
> << "(expected: " << info.id() << "). Committing suicide";
> }
> break;
> case RUNNING:
> // Already re-registered!
> if (!(info.id() == slaveId)) {
> EXIT(1) << "Re-registered but got wrong id: " << slaveId
> << "(expected: " << info.id() << "). Committing suicide";
> }
> LOG(WARNING) << "Already re-registered with master " << master;
> break;
> case TERMINATING:
> LOG(WARNING) << "Ignoring re-registration because slave is terminating";
> break;
> case RECOVERING:
> default:
> LOG(FATAL) << "Unexpected slave state " << state;
> break;
> }
> }
> Saw a slave fail because of this last case statement:
> F0903 02:01:26.436521 42417 slave.cpp:672] Unexpected slave state 0
> *** Check failure stack trace: ***
> @ 0x7f042c579d8d google::LogMessage::Fail()
> @ 0x7f042c57dd77 google::LogMessage::SendToLog()
> @ 0x7f042c57c674 google::LogMessage::Flush()
> @ 0x7f042c57c8a6 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f042c21db8a mesos::internal::slave::Slave::reregistered()
> @ 0x7f042c276c1d ProtobufProcess<>::handler1<>()
> @ 0x7f042c24560a std::tr1::_Function_handler<>::_M_invoke()
> @ 0x7f042c27702b ProtobufProcess<>::visit()
> @ 0x7f042c46baf4 process::ProcessManager::resume()
> @ 0x7f042c46c54f process::schedule()
> @ 0x7f042bbd983d start_thread
> @ 0x7f042a5bbf8d clone
> /usr/local/bin/mesos-slave.sh: line 117: 42408 Aborted (core
> dumped) /usr/local/sbin/mesos-slave --port=5051
> --resources="${MESOS_RESOURCES}" --attributes="${MESOS_ATTRIBUTES}"
> --master="${master_zoo_url}" --log_dir="${log_dir}" ${EXTRA_FLAGS} "$@"
> Slave Exit Status: 134
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira