Yvan Royon created MESOS-6986:
---------------------------------
Summary: abort in DRFSorter::add
Key: MESOS-6986
URL: https://issues.apache.org/jira/browse/MESOS-6986
Project: Mesos
Issue Type: Bug
Components: allocation
Affects Versions: 1.0.1
Environment: Mesosphere Enterprise DC/OS, CoreOS
Reporter: Yvan Royon
My mesos-master process terminated on SIGABRT.
The CHECK failed in function {{DRFSorter::add}}:
https://github.com/apache/mesos/blob/master/src/master/allocator/sorter/drf/sorter.cpp#L74
It seems there is a condition during framework registration where names are
lost?
We are using the mesos-go library ({{next}} branch), which uses the new HTTP
API. The framework is custom Go code. The crash is hard to reliably reproduce.
{code}
mesos-master[90061]: F0119 01:07:57.426159 90086 sorter.cpp:73] Check failed:
!contains(name)
mesos-master[90061]: *** Check failure stack trace: ***
mesos-master[90061]: @ 0x7f960d9299fd google::LogMessage::Fail()
mesos-master[90061]: @ 0x7f960d92b82d google::LogMessage::SendToLog()
mesos-master[90061]: @ 0x7f960d9295ec google::LogMessage::Flush()
mesos-master[90061]: @ 0x7f960d92c129
google::LogMessageFatal::~LogMessageFatal()
mesos-master[90061]: @ 0x7f960d03460d
mesos::internal::master::allocator::DRFSorter::add()
mesos-master[90061]: @ 0x7f960d021177
mesos::internal::master::allocator::internal::HierarchicalAllocatorProcess::addFramework()
mesos-master[90061]: @ 0x7f960d8b9381 process::ProcessManager::resume()
mesos-master[90061]: @ 0x7f960d8b9687
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
mesos-master[90061]: @ 0x7f960bf52d73 (unknown)
mesos-master[90061]: @ 0x7f960b74f52c (unknown)
mesos-master[90061]: @ 0x7f960b49180d (unknown)
systemd[1]: dcos-mesos-master.service: Main process exited, code=killed,
status=6/ABRT
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)