Xudong Ni created MESOS-10032:
---------------------------------

             Summary: Mesos agent should sever proactively master connection 
when failing to detect the leading master
                 Key: MESOS-10032
                 URL: https://issues.apache.org/jira/browse/MESOS-10032
             Project: Mesos
          Issue Type: Improvement
            Reporter: Xudong Ni


We have observed that this often happens when the agents losing ZK connections 
and resetting its master to None and beginning dropping messages from the 
master because they can't verify that the messages are valid.

This has caused Jarvis to be unable to kill tasks (and they aren't counted as 
unreachable because the master can still reach the agent).

A reasonable solution is for the agent to disconnect from the master upon 
resetting the master it tracks since it's just going to drop control messages.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to