[ https://issues.apache.org/jira/browse/MESOS-10194?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17224776#comment-17224776 ]
Jerome Soussens commented on MESOS-10194: ----------------------------------------- Hi [~asekretenko], I dont know if it's related but today we had this failure on master with Mesos 1.10.0 : {code:java} F1102 11:40:04.209203 6522 hierarchical.cpp:233] Check failed: scalars.at(slaveID) does not contain cpus(allocated: xxxxx):1; mem(allocated: xxxxx):15360 *** Check failure stack trace: *** e06184040 with resources mem(allocated: stable.main):256 of framework b98761e9-2e84-4971-b678-13b6619b18e1 on agent 4bfe1c0d-aabc-45f4-98fe-3a5480058440-S0 at slave(1)@192.168.250.63:5051 (leta.sophiagenetics.com) @ 0x7fd65a6fb94d google::LogMessage::SendToLog() @ 0x7fd65a6f91fb google::LogMessage::Flush() @ 0x7fd65a6fc3a9 google::LogMessageFatal::~LogMessageFatal() @ 0x7fd659114217 mesos::internal::master::allocator::internal::ScalarResourceTotals::subtract() @ 0x7fd659118202 mesos::internal::master::allocator::internal::RoleTree::untrackAllocated() @ 0x7fd659129afa mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::recoverResources() @ 0x7fd65a637081 process::ProcessBase::consume() @ 0x7fd65a65ceb7 process::ProcessManager::resume() @ 0x7fd65a660a76 _ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUlvE_vEEE6_M_runEv @ 0x7fd65a916d80 execute_native_thread_routine @ 0x7fd6569cee25 start_thread @ 0x7fd6561dbbad __clone mesos-master.service: main process exited, code=killed, status=6/ABRT {code} A more complete log : [^mesos_scalars_at_slaveId_crash.log] > Mesos master failure "Check failed: 'get_(role)' Must be SOME" > -------------------------------------------------------------- > > Key: MESOS-10194 > URL: https://issues.apache.org/jira/browse/MESOS-10194 > Project: Mesos > Issue Type: Bug > Affects Versions: 1.10.0, 1.11.0 > Reporter: Jerome Soussens > Assignee: Andrei Sekretenko > Priority: Critical > Attachments: log_mesos_crash_role_13102020.txt > > > > *Impact* : mesos-master crash with log : > {code:java} > hierarchical.cpp:460] Check failed: 'get_(role)' Must be SOME > {code} > *Possible scenario :* > A framework, using a specific role, is stopped. More or less at the same > time, some remaining task status for this framework comes to the master from > the executor. But the roles is no more listed. > -- This message was sent by Atlassian Jira (v8.3.4#803005)