[
https://issues.apache.org/jira/browse/MESOS-7389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Neil Conway reassigned MESOS-7389:
----------------------------------
Resolution: Fixed
Assignee: Neil Conway (was: Benjamin Mahler)
Fix Version/s: 1.4.0
Now that we have landed the fix for MESOS-6976, this crash should be
fixed/avoided.
> Mesos 1.2.0 crashes with pre-1.0 Mesos agents.
> ----------------------------------------------
>
> Key: MESOS-7389
> URL: https://issues.apache.org/jira/browse/MESOS-7389
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 1.2.0
> Environment: Ubuntu 14.04
> Reporter: Nicholas Studt
> Assignee: Neil Conway
> Priority: Critical
> Labels: mesosphere
> Fix For: 1.4.0
>
>
> During upgrade from 1.0.1 to 1.2.0 a single mesos-slave reregistering with
> the running leader caused the leader to terminate. All 3 of the masters
> suffered the same failure as the same slave node reregistered against the new
> leader, this continued across the entire cluster until the offending slave
> node was removed and fixed. The fix to the slave node was to remove the mesos
> directory and then start the slave node back up.
> F0412 17:24:42.736600 6317 master.cpp:5701] Check failed:
> frameworks_.contains(task.framework_id())
> *** Check failure stack trace: ***
> @ 0x7f59f944f94d google::LogMessage::Fail()
> @ 0x7f59f945177d google::LogMessage::SendToLog()
> @ 0x7f59f944f53c google::LogMessage::Flush()
> @ 0x7f59f9452079 google::LogMessageFatal::~LogMessageFatal()
> I0412 17:24:42.750300 6316 replica.cpp:693] Replica received learned notice
> for position 6896 from @0.0.0.0:0
> @ 0x7f59f88f2341 mesos::internal::master::Master::_reregisterSlave()
> @ 0x7f59f88f488f
> _ZNSt17_Function_handlerIFvPN7process11ProcessBaseEEZNS0_8dispatchIN5mesos8internal6master6MasterERKNS5_9SlaveInfoERKNS0_4UPIDERKSt6vectorINS5_8ResourceESaISG_EERKSF_INS5_12ExecutorInfoESaISL_EERKSF_INS5_4TaskESaISQ_EERKSF_INS5_13FrameworkInfoESaISV_EERKSF_INS6_17Archive_FrameworkESaIS10_EERKSsRKSF_INS5_20SlaveInfo_CapabilityESaIS17_EERKNS0_6FutureIbEES9_SC_SI_SN_SS_SX_S12_SsS19_S1D_EEvRKNS0_3PIDIT_EEMS1H_FvT0_T1_T2_T3_T4_T5_T6_T7_T8_T9_ET10_T11_T12_T13_T14_T15_T16_T17_T18_T19_EUlS2_E_E9_M_invokeERKSt9_Any_dataS2_
> @ 0x7f59f93c3eb1 process::ProcessManager::resume()
> @ 0x7f59f93ccd57
> _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
> @ 0x7f59f77cfa60 (unknown)
> @ 0x7f59f6fec184 start_thread
> @ 0x7f59f6d19bed (unknown)
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)