Megha Sharma created MESOS-8750:
-----------------------------------
Summary: Check failed: !slaves.registered.contains(task->slave_id)
Key: MESOS-8750
URL: https://issues.apache.org/jira/browse/MESOS-8750
Project: Mesos
Issue Type: Task
Components: master
Reporter: Megha Sharma
It appears that in certain circumstances an unreachable task doesn't get
cleaned up from the framework.unreachableTasks when the respective agent
re-registers leading to this check failure later when the framework is being
removed. When an agent goes unreachable master adds the tasks from this agent
to framework.unreachableTasks and when such an agent re-registers the master
removes the tasks that it specifies during re-registeration from this
datastructure but there could be tasks that the agent doesn't know about e.g.
if the runTask message for them got dropped and so such tasks will not get
removed from unreachableTasks.
```
F0112 21:50:39.272985 44038 master.cpp:9617] Check failed:
!slaves.registered.contains(task->slave_id())
*** Check failure stack trace: ***
@ 0x7fb7260692bd (unknown)
@ 0x7fb72606b04d (unknown)
@ 0x7fb726068e42 (unknown)
@ 0x7fb72606ba29 (unknown)
@ 0x7fb7251f5226 (unknown)
@ 0x7fb725120081 (unknown)
@ 0x7fb72519ca37 (unknown)
@ 0x7fb725fbb2fe (unknown)
@ 0x7fb724f75de9 (unknown)
@ 0x7fb725fb4fc2 (unknown)
@ 0x7fb725fc4a17 (unknown)
@ 0x7fb725fca276 (unknown)
@ 0x7fb72352d470 (unknown)
@ 0x7fb723784aa1 start_thread
@ 0x7fb722f47bcd clone
@ (nil) (unknown)
Aborted
```
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)