Megha Sharma created MESOS-8750:
-----------------------------------

             Summary: Check failed: !slaves.registered.contains(task->slave_id)
                 Key: MESOS-8750
                 URL: https://issues.apache.org/jira/browse/MESOS-8750
             Project: Mesos
          Issue Type: Task
          Components: master
            Reporter: Megha Sharma


It appears that in certain circumstances an unreachable task doesn't get 
cleaned up from the framework.unreachableTasks when the respective agent 
re-registers leading to this check failure later when the framework is being 
removed. When an agent goes unreachable master adds the tasks from this agent 
to framework.unreachableTasks and when such an agent re-registers the master 
removes the tasks that it specifies during re-registeration from this 
datastructure but there could be tasks that the agent doesn't know about e.g. 
if the runTask message for them got dropped and so such tasks will not get 
removed from unreachableTasks.

```

F0112 21:50:39.272985 44038 master.cpp:9617] Check failed: 
!slaves.registered.contains(task->slave_id())
*** Check failure stack trace: ***
@ 0x7fb7260692bd (unknown)
@ 0x7fb72606b04d (unknown)
@ 0x7fb726068e42 (unknown)
@ 0x7fb72606ba29 (unknown)
@ 0x7fb7251f5226 (unknown)
@ 0x7fb725120081 (unknown)
@ 0x7fb72519ca37 (unknown)
@ 0x7fb725fbb2fe (unknown)
@ 0x7fb724f75de9 (unknown)
@ 0x7fb725fb4fc2 (unknown)
@ 0x7fb725fc4a17 (unknown)
@ 0x7fb725fca276 (unknown)
@ 0x7fb72352d470 (unknown)
@ 0x7fb723784aa1 start_thread
@ 0x7fb722f47bcd clone
@ (nil) (unknown)
Aborted

```



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to