Repository: mesos Updated Branches: refs/heads/master 2ec2e48d1 -> 498a000ac
Updated configuration.md for --executor_reregistration_retry_interval. Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/498a000a Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/498a000a Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/498a000a Branch: refs/heads/master Commit: 498a000ac1bb8f51dc871f22aea265424a407a17 Parents: 2ec2e48 Author: Adam B <a...@mesosphere.io> Authored: Wed Aug 2 01:24:10 2017 -0700 Committer: Adam B <a...@mesosphere.io> Committed: Wed Aug 2 01:27:39 2017 -0700 ---------------------------------------------------------------------- docs/configuration.md | 27 ++++++++++++++++++++++++++- 1 file changed, 26 insertions(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/498a000a/docs/configuration.md ---------------------------------------------------------------------- diff --git a/docs/configuration.md b/docs/configuration.md index 5449b92..058e366 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1540,7 +1540,32 @@ master until this timeout has elapsed (see MESOS-7539). (default: 2secs) </tr> <tr> <td> - --max_completed_executors_per_framework + --executor_reregistration_retry_interval=VALUE + </td> + <td> +For PID-based executors, how long the agent waits before retrying +the reconnect message sent to the executor during recovery. +NOTE: Do not use this unless you understand the following +(see MESOS-5332): PID-based executors using Mesos libraries >= 1.1.2 +always re-link with the agent upon receiving the reconnect message. +This avoids the executor replying on a half-open TCP connection to +the old agent (possible if netfilter is dropping packets, +see: MESOS-7057). However, PID-based executors using Mesos +libraries < 1.1.2 do not re-link and are therefore prone to +replying on a half-open connection after the agent restarts. If we +only send a single reconnect message, these "old" executors will +reply on their half-open connection and receive a RST; without any +retries, they will fail to reconnect and be killed by the agent once +the executor re-registration timeout elapses. To ensure these "old" +executors can reconnect in the presence of netfilter dropping +packets, we introduced optional retries of the reconnect message. +This results in "old" executors correctly establishing a link +when processing the second reconnect message. (default: no retries) + </td> +</tr> +<tr> + <td> + --max_completed_executors_per_framework=VALUE </td> <td> Maximum number of completed executors per framework to store