Benjamin Mahler created MESOS-676: ------------------------------------- Summary: Slave::reregistered LOG(FATAL)s due to being in RECOVERING state. Key: MESOS-676 URL: https://issues.apache.org/jira/browse/MESOS-676 Project: Mesos Issue Type: Bug Reporter: Benjamin Mahler Assignee: Benjamin Mahler Fix For: 0.14.0
void Slave::reregistered(const SlaveID& slaveId) { switch(state) { case DISCONNECTED: LOG(INFO) << "Re-registered with master " << master; state = RUNNING; if (!(info.id() == slaveId)) { EXIT(1) << "Re-registered but got wrong id: " << slaveId << "(expected: " << info.id() << "). Committing suicide"; } break; case RUNNING: // Already re-registered! if (!(info.id() == slaveId)) { EXIT(1) << "Re-registered but got wrong id: " << slaveId << "(expected: " << info.id() << "). Committing suicide"; } LOG(WARNING) << "Already re-registered with master " << master; break; case TERMINATING: LOG(WARNING) << "Ignoring re-registration because slave is terminating"; break; case RECOVERING: default: LOG(FATAL) << "Unexpected slave state " << state; break; } } Saw a slave fail because of this last case statement: F0903 02:01:26.436521 42417 slave.cpp:672] Unexpected slave state 0 *** Check failure stack trace: *** @ 0x7f042c579d8d google::LogMessage::Fail() @ 0x7f042c57dd77 google::LogMessage::SendToLog() @ 0x7f042c57c674 google::LogMessage::Flush() @ 0x7f042c57c8a6 google::LogMessageFatal::~LogMessageFatal() @ 0x7f042c21db8a mesos::internal::slave::Slave::reregistered() @ 0x7f042c276c1d ProtobufProcess<>::handler1<>() @ 0x7f042c24560a std::tr1::_Function_handler<>::_M_invoke() @ 0x7f042c27702b ProtobufProcess<>::visit() @ 0x7f042c46baf4 process::ProcessManager::resume() @ 0x7f042c46c54f process::schedule() @ 0x7f042bbd983d start_thread @ 0x7f042a5bbf8d clone /usr/local/bin/mesos-slave.sh: line 117: 42408 Aborted (core dumped) /usr/local/sbin/mesos-slave --port=5051 --resources="${MESOS_RESOURCES}" --attributes="${MESOS_ATTRIBUTES}" --master="${master_zoo_url}" --log_dir="${log_dir}" ${EXTRA_FLAGS} "$@" Slave Exit Status: 134 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira