Benjamin Mahler created MESOS-675:
-------------------------------------
Summary: CHECK failure in the Master.
Key: MESOS-675
URL: https://issues.apache.org/jira/browse/MESOS-675
Project: Mesos
Issue Type: Bug
Reporter: Benjamin Mahler
Assignee: Benjamin Mahler
Priority: Blocker
Fix For: 0.14.0
Observed this failure in a staging cluster running 0.14.0-rc2.
{noformat}
F0902 06:01:11.105391 11876 master.cpp:564] Check failed: !slave->disconnected
Slave 201308270033-1937777162-5050-50911-137 (<scrub>)
already disconnected!
*** Check failure stack trace: ***
@ 0x7fb470894d8d google::LogMessage::Fail()
@ 0x7fb470898d77 google::LogMessage::SendToLog()
@ 0x7fb470897674 google::LogMessage::Flush()
@ 0x7fb4708978a6 google::LogMessageFatal::~LogMessageFatal()
@ 0x7fb4704aaea4 mesos::internal::master::Master::exited()
@ 0x7fb470786af4 process::ProcessManager::resume()
@ 0x7fb47078754f process::schedule()
@ 0x7fb46fef483d start_thread
@ 0x7fb46e8d6f8d clone
{noformat}
Grepping for this slave in the logs:
{noformat}
$ grep 201308270033-1937777162-5050-50911-137 /var/log/mesos/mesos-master.log
W0902 06:01:10.607168 11876 master.cpp:1317] Ignoring unknown exited executor
thermos-1377831261464-mesos-slave-recovery-spinner-60-f0bcfda6-4f8d-4df4-bd74-0b15f32d0502
on slave 201308270033-1937777162-5050-50911-137 (<scrub>)
...
W0902 06:01:10.646383 11876 master.cpp:1317] Ignoring unknown exited executor
thermos-1377964938274-mesos-slave-recovery-spinner-184-3a25b824-5d73-4be0-984d-606230c5e8ac
on slave 201308270033-1937777162-5050-50911-137 (<scrub>)
W0902 06:01:10.699635 11876 master.cpp:1123] Slave at
slave(1)@10.34.110.125:5051 (<scrub>) is being allowed to re-register with an
already in use id (201308270033-1937777162-5050-50911-137)
I0902 06:01:10.700628 11868 hierarchical_allocator_process.hpp:434] Added slave
201308270033-1937777162-5050-50911-137 (<scrub>) with cpus(*):14; mem(*):21913;
ports(*):[31000-32000]; disk(*):400000 (and cpus(*):10.96; mem(*):19866;
ports(*):[31000-31003, 31005-31449, 31451-31580, 31582-31801, 31803-31927,
31929-32000]; disk(*):397809 available)
W0902 06:01:10.866525 11876 master.cpp:1123] Slave at
slave(1)@10.34.110.125:5051 (<scrub>) is being allowed to re-register with an
already in use id (201308270033-1937777162-5050-50911-137)
W0902 06:01:10.919178 11876 master.cpp:1123] Slave at
slave(1)@10.34.110.125:5051 (<scrub>) is being allowed to re-register with an
already in use id (201308270033-1937777162-5050-50911-137)
W0902 06:01:11.070862 11876 master.cpp:1123] Slave at
slave(1)@10.34.110.125:5051 (<scrub>) is being allowed to re-register with an
already in use id (201308270033-1937777162-5050-50911-137)
I0902 06:01:11.085773 11876 master.cpp:553] Slave
201308270033-1937777162-5050-50911-137 (<scrub>) disconnected
W0902 06:01:11.086096 11876 master.cpp:1404] Master returning resources offered
because slave 201308270033-1937777162-5050-50911-137 is disconnected
I0902 06:01:11.086145 11867 hierarchical_allocator_process.hpp:459] Removed
slave 201308270033-1937777162-5050-50911-137
I0902 06:01:11.104651 11876 master.cpp:553] Slave
201308270033-1937777162-5050-50911-137 (<scrub>) disconnected
F0902 06:01:11.105391 11876 master.cpp:564] Check failed: !slave->disconnected
Slave 201308270033-1937777162-5050-50911-137 (<scrub>) already disconnected!
{noformat}
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira