Benjamin Mahler created MESOS-1376:
--------------------------------------
Summary: CHECK failure in the Registrar
Key: MESOS-1376
URL: https://issues.apache.org/jira/browse/MESOS-1376
Project: Mesos
Issue Type: Bug
Components: master
Affects Versions: 0.19.0
Reporter: Benjamin Mahler
Priority: Blocker
Fix For: 0.19.0
{noformat}
I0515 05:44:37.049137 7179 master.cpp:2301] Ignoring re-register slave message
from slave 20140416-015639-1890854154-5050-1354-24152 at
slave(1)@10.34.119.132:5051 (smf1-aep-35-sr1.prod.twitter.com) as readmission
is already in progress
E0515 05:44:37.271734 7168 registrar.cpp:500] Registrar aborting: Failed to
update 'registry': Failed to perform store within 5secs
F0515 05:44:37.271728 7170 master.cpp:2341] Failed to readmit slave
20140416-015639-1890854154-5050-1354-24133 at slave(1)@10.34.119.131:5051
(smf1-aep-31-sr4.prod.twitter.com): Failed to update 'registry': Failed to
perform store within 5secs
*** Check failure stack trace: ***
F0515 05:44:37.272384 7168 owned.hpp:103] Check failed: data->t != NULL This
owned pointer has already been shared
*** Check failure stack trace: ***
@ 0x7f687d06e2ad google::LogMessage::Fail()
@ 0x7f687d06e2ad google::LogMessage::Fail()
@ 0x7f687d0700f4 google::LogMessage::SendToLog()
@ 0x7f687d0700f4 google::LogMessage::SendToLog()
@ 0x7f687d06de9c google::LogMessage::Flush()
@ 0x7f687d06de9c google::LogMessage::Flush()
@ 0x7f687d0709e9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f687d0709e9 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f687cc46182 process::Owned<>::get()
@ 0x7f687cbdaa41 mesos::internal::master::Master::_reregisterSlave()
@ 0x7f687cc46209 process::Owned<>::operator->()
@ 0x7f687cbe987a
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_12ExecutorInfoESaISG_EERKSF_INS6_4TaskESaISL_EERKSF_INS6_17Archive_FrameworkESaISQ_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SW_EEvRKNS0_3PIDIT_EEMS10_FvT0_T1_T2_T3_T4_T5_ET6_T7_T8_T9_T10_T11_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f687cc39e05 mesos::internal::master::fail()
@ 0x7f687cfa3c72 process::ProcessManager::resume()
@ 0x7f687cc39f97 mesos::internal::master::RegistrarProcess::abort()
@ 0x7f687cc3d77f mesos::internal::master::RegistrarProcess::_update()
@ 0x7f687cfa3f6c process::schedule()
@ 0x7f687c47883d start_thread
@ 0x7f687cc47b27
_ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master16RegistrarProcessERKNS0_6FutureI6OptionINS6_5state8protobuf8VariableINS6_8RegistryEEEEEESt5dequeINS0_5OwnedINS7_9OperationEEESaISN_EESH_SP_EEvRKNS0_3PIDIT_EEMSR_FvT0_T1_ET2_T3_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
@ 0x7f687b1e026d clone
{noformat}
[~jieyu] pointed out the following problematic code:
{code}
// Helper for failing a deque of operations.
void fail(deque<Owned<Operation> >* operations, const string& message)
{
while (!operations->empty()) {
const Owned<Operation>& operation = operations->front(); // This reference
becomes invalid!
operations->pop_front();
operation->fail(message);
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)