[
https://issues.apache.org/jira/browse/MESOS-7991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16365532#comment-16365532
]
Alexander Rukletsov commented on MESOS-7991:
--------------------------------------------
Lowering priority to "Major" because the issue is apparently rare (we have only
one instance so far) and not severe. Keeping it open because one internal
invariant is apparently not an invariant and can break.
> fatal, check failed !framework->recovered()
> -------------------------------------------
>
> Key: MESOS-7991
> URL: https://issues.apache.org/jira/browse/MESOS-7991
> Project: Mesos
> Issue Type: Bug
> Reporter: Jack Crawford
> Assignee: Alexander Rukletsov
> Priority: Critical
> Labels: reliability
>
> mesos master crashed on what appears to be framework recovery
> mesos master version: 1.3.1
> mesos agent version: 1.3.1
> {code}
> W0920 14:58:54.756364 25452 master.cpp:7568] Task
> 862181ec-dffb-4c03-8807-5fb4c4e9a907 of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756369 25452 master.cpp:7568] Task
> 9c21c48a-63ad-4d58-9e22-f720af19a644 of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756376 25452 master.cpp:7568] Task
> 05c451f8-c48a-47bd-a235-0ceb9b3f8d0c of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756381 25452 master.cpp:7568] Task
> e8641b1f-f67f-42fe-821c-09e5a290fc60 of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756386 25452 master.cpp:7568] Task
> f838a03c-5cd4-47eb-8606-69b004d89808 of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756392 25452 master.cpp:7568] Task
> 685ca5da-fa24-494d-a806-06e03bbf00bd of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> W0920 14:58:54.756397 25452 master.cpp:7568] Task
> 65ccf39b-5c46-4121-9fdd-21570e8068e6 of framework
> 889aae9d-1aab-4268-ba42-9d5c2461d871 unknown to the agent
> a498d458-bbca-426e-b076-b328f5b035da-S5225 at slave(1)
> @10.0.239.217:5051 (ip-10-0-239-217) during re-registration: reconciling with
> the agent
> F0920 14:58:54.756404 25452 master.cpp:7601] Check failed:
> !framework->recovered()
> *** Check failure stack trace: ***
> @ 0x7f7bf80087ed google::LogMessage::Fail()
> @ 0x7f7bf800a5a0 google::LogMessage::SendToLog()
> @ 0x7f7bf80083d3 google::LogMessage::Flush()
> @ 0x7f7bf800afc9 google::LogMessageFatal::~LogMessageFatal()
> @ 0x7f7bf736fe7e
> mesos::internal::master::Master::reconcileKnownSlave()
> @ 0x7f7bf739e612 mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f7bf73a580e
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERK6OptionINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIc
> EEEERKSt6vectorINS5_8ResourceESaISQ_EERKSP_INS5_12ExecutorInfoESaISV_EERKSP_INS5_4TaskESaIS10_EERKSP_INS5_13FrameworkInfoESaIS15_EERKSP_INS6_17Archive_FrameworkESaIS1A_EERKSL_RKSP_INS5_20SlaveInfo_CapabilityESaIS
> 1H_EERKNS0_6FutureIbEES9_SC_SM_SS_SX_S12_S17_S1C_SL_S1J_S1N_EEvRKNS0_3PIDIT_EEMS1R_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_T10_ET11_T12_T13_T14_T15_T16_T17_T18_T19_T20_T21_EUlS2_E_E9_M_invokeERKSt9_Any_dataOS2_
> @ 0x7f7bf7f5e69c process::ProcessBase::visit()
> @ 0x7f7bf7f71403 process::ProcessManager::resume()
> @ 0x7f7bf7f7c127
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f7bf60b5c80 (unknown)
> @ 0x7f7bf58c86ba start_thread
> @ 0x7f7bf55fe3dd (unknown)
> mesos-master.service: Main process exited, code=killed, status=6/ABRT
> mesos-master.service: Unit entered failed state.
> mesos-master.service: Failed with result 'signal'.
> {code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)