Modified the executor driver to always relink on agent failover. A relink is needed in cases where a netfilter module like iptables can terminate the connection without notifying the executor. This results in the executor still trying to reuse the stale "half-open" connection upon receiving the reconnect message from the executor leading to the erroneous behavior.
Review: https://reviews.apache.org/r/56568/ Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/d9e3c8aa Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/d9e3c8aa Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/d9e3c8aa Branch: refs/heads/1.1.x Commit: d9e3c8aa4aedf1dac7716cc31a3e71f8db0242da Parents: 044fafe Author: Anand Mazumdar <[email protected]> Authored: Fri Feb 10 17:10:49 2017 -0800 Committer: Alexander Rukletsov <[email protected]> Committed: Wed Apr 26 14:40:21 2017 +0200 ---------------------------------------------------------------------- src/exec/exec.cpp | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/d9e3c8aa/src/exec/exec.cpp ---------------------------------------------------------------------- diff --git a/src/exec/exec.cpp b/src/exec/exec.cpp index 1dc2039..2ac312d 100644 --- a/src/exec/exec.cpp +++ b/src/exec/exec.cpp @@ -284,7 +284,12 @@ protected: // Update the slave link. slave = from; - link(slave); + + // We force a reconnect here to avoid sending on a stale "half-open" + // socket. We do not detect a disconnection in some cases when the + // connection is terminated by a netfilter module e.g., iptables + // running on the agent (see MESOS-5332). + link(slave, RemoteConnection::RECONNECT); // Re-register with slave. ReregisterExecutorMessage message;
