[ https://issues.apache.org/jira/browse/MESOS-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070648#comment-14070648 ]
Vinod Kone commented on MESOS-1193: ----------------------------------- Hey Steven. Sorry to hear that you ran into this issue. This was fixed our recently released 0.19.1. I'm still curious why your entire cluster ran into this issue. AFAICT, this happens due to a race condition when a destroy() of the container happens while in the process of launching(). ere all your executors across your entire cluster stuck in launch (hdfs issue?) and the framework(chronos) tried to kill them all due to a timeout? cc [~idownes] > Check failed: promises.contains(containerId) crashes slave > ---------------------------------------------------------- > > Key: MESOS-1193 > URL: https://issues.apache.org/jira/browse/MESOS-1193 > Project: Mesos > Issue Type: Bug > Components: containerization > Affects Versions: 0.18.0 > Reporter: Tobi Knaup > > This was observed with four slaves on one machine, one framework (Marathon) > and around 100 tasks per slave. > I0404 17:58:58.298075 3939 mesos_containerizer.cpp:891] Executor for > container '6d4de71c-a491-4544-afe0-afcbfa37094a' has exited > I0404 17:58:58.298395 3938 slave.cpp:2047] Executor 'web_467-1396634277535' > of framework 201404041625-3823062160-55371-22555-0000 has terminated with > signal Killed > E0404 17:58:58.298475 3929 slave.cpp:2320] Failed to unmonitor container for > executor web_467-1396634277535 of framework > 201404041625-3823062160-55371-22555-0000: Not monitored > I0404 17:58:58.299075 3938 slave.cpp:1643] Handling status update > TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task > web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 > from @0.0.0.0:0 > I0404 17:58:58.299232 3932 status_update_manager.cpp:315] Received status > update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task > web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 > I0404 17:58:58.299360 3932 status_update_manager.cpp:368] Forwarding status > update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task > web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 > to master@144.76.223.227:5050 > I0404 17:58:58.306967 3932 status_update_manager.cpp:393] Received status > update acknowledgement (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task > web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 > I0404 17:58:58.307049 3932 slave.cpp:2186] Cleaning up executor > 'web_467-1396634277535' of framework 201404041625-3823062160-55371-22555-0000 > I0404 17:58:58.307122 3932 gc.cpp:56] Scheduling > '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535/runs/6d4de71c-a491-4544-afe0-afcbfa37094a' > for gc 6.99999644578667days in the future > I0404 17:58:58.307157 3932 gc.cpp:56] Scheduling > '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535' > for gc 6.99999644553185days in the future > F0404 17:58:58.597434 3938 mesos_containerizer.cpp:682] Check failed: > promises.contains(containerId) > *** Check failure stack trace: *** > @ 0x7f5209da6e5d google::LogMessage::Fail() > @ 0x7f5209da8c9d google::LogMessage::SendToLog() > @ 0x7f5209da6a4c google::LogMessage::Flush() > @ 0x7f5209da9599 google::LogMessageFatal::~LogMessageFatal() > @ 0x7f5209ad9f88 > mesos::internal::slave::MesosContainerizerProcess::exec() > @ 0x7f5209af3b56 > _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_ > @ 0x7f5209cd0bf2 process::ProcessManager::resume() > @ 0x7f5209cd0eec process::schedule() > @ 0x7f5208b48f6e start_thread > @ 0x7f52088739cd (unknown) -- This message was sent by Atlassian JIRA (v6.2#6252)