[ 
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365532#comment-16365532
 ] 

Alexander Rukletsov commented on MESOS-7991:
--------------------------------------------

Lowering priority to "Major" because the issue is apparently rare (we have only 
one instance so far) and not severe. Keeping it open because one internal 
invariant is apparently not an invariant and can break.

> fatal, check failed !framework->recovered()
> -------------------------------------------
>
>                 Key: MESOS-7991
>                 URL: https://issues.apache.org/jira/browse/MESOS-7991
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: Jack Crawford
>            Assignee: Alexander Rukletsov
>            Priority: Critical
>              Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task 
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task 
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task 
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task 
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task 
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task 
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task 
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework 
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent 
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with 
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed: 
> !framework->recovered()
> *** Check failure stack trace: ***
>     @     0x7f7bf80087ed  google::LogMessage::Fail()
>     @     0x7f7bf800a5a0  google::LogMessage::SendToLog()
>     @     0x7f7bf80083d3  google::LogMessage::Flush()
>     @     0x7f7bf800afc9  google::LogMessageFatal::~LogMessageFatal()
>     @     0x7f7bf736fe7e  
> mesos::internal::master::Master::reconcileKnownSlave()
>     @     0x7f7bf739e612  mesos::internal::master::Master::_reregisterSlave()
>     @     0x7f7bf73a580e  
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> EEEERKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
>     @     0x7f7bf7f5e69c  process::ProcessBase::visit()
>     @     0x7f7bf7f71403  process::ProcessManager::resume()
>     @     0x7f7bf7f7c127  
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
>     @     0x7f7bf60b5c80  (unknown)
>     @     0x7f7bf58c86ba  start_thread
>     @     0x7f7bf55fe3dd  (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to