[ 
https://issues.apache.org/jira/browse/MESOS-1193?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070648#comment-14070648
 ] 

Vinod Kone commented on MESOS-1193:
-----------------------------------

Hey Steven. Sorry to hear that you ran into this issue. This was fixed our 
recently released 0.19.1.

I'm still curious why your entire cluster ran into this issue. AFAICT, this 
happens due to a race condition when a destroy() of the container happens while 
in the process of launching(). ere all your executors across your entire 
cluster stuck in launch (hdfs issue?) and the framework(chronos) tried to kill 
them all due to a timeout? cc [~idownes]

> Check failed: promises.contains(containerId) crashes slave
> ----------------------------------------------------------
>
>                 Key: MESOS-1193
>                 URL: https://issues.apache.org/jira/browse/MESOS-1193
>             Project: Mesos
>          Issue Type: Bug
>          Components: containerization
>    Affects Versions: 0.18.0
>            Reporter: Tobi Knaup
>
> This was observed with four slaves on one machine, one framework (Marathon) 
> and around 100 tasks per slave.
> I0404 17:58:58.298075  3939 mesos_containerizer.cpp:891] Executor for 
> container '6d4de71c-a491-4544-afe0-afcbfa37094a' has exited
> I0404 17:58:58.298395  3938 slave.cpp:2047] Executor 'web_467-1396634277535' 
> of framework 201404041625-3823062160-55371-22555-0000 has terminated with 
> signal Killed
> E0404 17:58:58.298475  3929 slave.cpp:2320] Failed to unmonitor container for 
> executor web_467-1396634277535 of framework 
> 201404041625-3823062160-55371-22555-0000: Not monitored
> I0404 17:58:58.299075  3938 slave.cpp:1643] Handling status update 
> TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 
> from @0.0.0.0:0
> I0404 17:58:58.299232  3932 status_update_manager.cpp:315] Received status 
> update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.299360  3932 status_update_manager.cpp:368] Forwarding status 
> update TASK_FAILED (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000 
> to master@144.76.223.227:5050
> I0404 17:58:58.306967  3932 status_update_manager.cpp:393] Received status 
> update acknowledgement (UUID: c815e057-e7a2-4c26-a382-6796a1585d1d) for task 
> web_467-1396634277535 of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.307049  3932 slave.cpp:2186] Cleaning up executor 
> 'web_467-1396634277535' of framework 201404041625-3823062160-55371-22555-0000
> I0404 17:58:58.307122  3932 gc.cpp:56] Scheduling 
> '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535/runs/6d4de71c-a491-4544-afe0-afcbfa37094a'
>  for gc 6.99999644578667days in the future
> I0404 17:58:58.307157  3932 gc.cpp:56] Scheduling 
> '/tmp/mesos5053/slaves/20140404-164105-3823062160-5050-24762-5/frameworks/201404041625-3823062160-55371-22555-0000/executors/web_467-1396634277535'
>  for gc 6.99999644553185days in the future
> F0404 17:58:58.597434  3938 mesos_containerizer.cpp:682] Check failed: 
> promises.contains(containerId)
> *** Check failure stack trace: ***
>     @     0x7f5209da6e5d  google::LogMessage::Fail()
>     @     0x7f5209da8c9d  google::LogMessage::SendToLog()
>     @     0x7f5209da6a4c  google::LogMessage::Flush()
>     @     0x7f5209da9599  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f5209ad9f88  
> mesos::internal::slave::MesosContainerizerProcess::exec()
>     @     0x7f5209af3b56  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchI7NothingN5mesos8internal5slave25MesosContainerizerProcessERKNS6_11ContainerIDEiSA_iEENS0_6FutureIT_EERKNS0_3PIDIT0_EEMSH_FSF_T1_T2_ET3_T4_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
>     @     0x7f5209cd0bf2  process::ProcessManager::resume()
>     @     0x7f5209cd0eec  process::schedule()
>     @     0x7f5208b48f6e  start_thread
>     @     0x7f52088739cd  (unknown)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to