Xudong Ni created MESOS-10032: --------------------------------- Summary: Mesos agent should sever proactively master connection when failing to detect the leading master Key: MESOS-10032 URL: https://issues.apache.org/jira/browse/MESOS-10032 Project: Mesos Issue Type: Improvement Reporter: Xudong Ni
We have observed that this often happens when the agents losing ZK connections and resetting its master to None and beginning dropping messages from the master because they can't verify that the messages are valid. This has caused Jarvis to be unable to kill tasks (and they aren't counted as unreachable because the master can still reach the agent). A reasonable solution is for the agent to disconnect from the master upon resetting the master it tracks since it's just going to drop control messages. -- This message was sent by Atlassian Jira (v8.3.4#803005)