[ 
https://issues.apache.org/jira/browse/MESOS-7077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15856394#comment-15856394
 ] 

James Peach commented on MESOS-7077:
------------------------------------

We have a test environment which continuously deploys nightly builds of the ASF 
master repository. The deploy is a puppet change, so there's no guaranteed 
ordering. When the master is redeployed the expectation is that agents running 
potentially different versions will reconnect to the new master and that tasks 
won't be disrupted.

> Check failed: resource.has_allocation_info().
> ---------------------------------------------
>
>                 Key: MESOS-7077
>                 URL: https://issues.apache.org/jira/browse/MESOS-7077
>             Project: Mesos
>          Issue Type: Bug
>            Reporter: James Peach
>            Priority: Critical
>
> Seeing this {{CHECK}} fail with top-of-tree master:
> {noformat}
> F0207 16:00:44.657328 3351272 master.cpp:8980] Check failed: 
> resource.has_allocation_info()
> {noformat}
> The symbolicated backtrace is:
> {noformat}
> (gdb) where
> #0  0x00007f009f1315e5 in raise () from /lib64/libc.so.6
> #1  0x00007f009f132dc5 in abort () from /lib64/libc.so.6
> #2  0x00007f00a168e496 in google::DumpStackTraceAndExit () at 
> src/utilities.cc:147
> #3  0x00007f00a1685e7d in google::LogMessage::Fail () at src/logging.cc:1458
> #4  0x00007f00a1687c0d in google::LogMessage::SendToLog (this=Unhandled dwarf 
> expression opcode 0xf3
> ) at src/logging.cc:1412
> #5  0x00007f00a1685a02 in google::LogMessage::Flush (this=0x7f00917ef560) at 
> src/logging.cc:1281
> #6  0x00007f00a16885e9 in google::LogMessageFatal::~LogMessageFatal 
> (this=Unhandled dwarf expression opcode 0xf3
> ) at src/logging.cc:1984
> #7  0x00007f00a0a1184c in mesos::internal::master::Slave::addTask 
> (this=0x7f007c830280, task=0x7f0080835340)
>     at ../../src/master/master.cpp:8980
> #8  0x00007f00a0a18b53 in mesos::internal::master::Slave::Slave 
> (this=0x7f007c830280, _master=Unhandled dwarf expression opcode 0xf3
> )
>     at ../../src/master/master.cpp:8947
> #9  0x00007f00a0a19c57 in mesos::internal::master::Master::_reregisterSlave 
> (this=0x7f00990bf000,
>     slaveInfo=..., pid=..., checkpointedResources=Unhandled dwarf expression 
> opcode 0xf3
> ) at ../../src/master/master.cpp:5759
> #10 0x00007f00a0a1cb22 in operator() (__functor=Unhandled dwarf expression 
> opcode 0xf3
> )
>     at ../../3rdparty/libprocess/include/process/dispatch.hpp:229
> #11 std::_Function_handler<void(process::ProcessBase*), 
> process::dispatch(const process::PID<T>&, void (T::*)(P0, P1, P2, P3, P4, P5, 
> P6, P7, P8, P9), A0, A1, A2, A3, A4, A5, A6, A7, A8, A9) [with T = 
> mesos::internal::master::Master; P0 = const mesos::SlaveInfo&; P1 = const 
> process::UPID&; P2 = const std::vector<mesos::Resource>&; P3 = const 
> std::vector<mesos::ExecutorInfo>&; P4 = const std::vector<mesos::Task>&; P5 = 
> const std::vector<mesos::FrameworkInfo>&; P6 = const 
> std::vector<mesos::internal::Archive_Framework>&; P7 = const 
> std::basic_string<char>&; P8 = const 
> std::vector<mesos::SlaveInfo_Capability>&; P9 = const process::Future<bool>&; 
> A0 = mesos::SlaveInfo; A1 = process::UPID; A2 = std::vector<mesos::Resource>; 
> A3 = std::vector<mesos::ExecutorInfo>; A4 = std::vector<mesos::Task>; A5 = 
> std::vector<mesos::FrameworkInfo>; A6 = 
> std::vector<mesos::internal::Archive_Framework>; A7 = 
> std::basic_string<char>; A8 = std::vector<mesos::SlaveInfo_Capability>; A9 = 
> process::Future<bool>]::<lambda(process::ProcessBase*)> >::_M_invoke(const 
> std::_Any_data &, process::ProcessBase *) (
>     __functor=Unhandled dwarf expression opcode 0xf3
> {noformat}
> I expect that this happened because the master moved to the latest version 
> before all the agents had moved.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to