Repository: mesos
Updated Branches:
  refs/heads/master fe02e02cf -> c4fabadba


Improved task reconciliation documentation.


Project: http://git-wip-us.apache.org/repos/asf/mesos/repo
Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/c4fabadb
Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/c4fabadb
Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/c4fabadb

Branch: refs/heads/master
Commit: c4fabadbaa9e9bc98f04b639936fbbb49346bf25
Parents: fe02e02
Author: Benjamin Mahler <[email protected]>
Authored: Mon Jul 27 10:55:39 2015 -0700
Committer: Benjamin Mahler <[email protected]>
Committed: Mon Jul 27 11:34:17 2015 -0700

----------------------------------------------------------------------
 docs/reconciliation.md | 32 +++++++++++++++++++++++++++-----
 1 file changed, 27 insertions(+), 5 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/mesos/blob/c4fabadb/docs/reconciliation.md
----------------------------------------------------------------------
diff --git a/docs/reconciliation.md b/docs/reconciliation.md
index 89ddf5b..b50d692 100644
--- a/docs/reconciliation.md
+++ b/docs/reconciliation.md
@@ -40,6 +40,15 @@ task state reconciliation.
 
 ## Task Reconciliation
 
+Mesos provides two forms of reconciliation:
+
+* "Explicit" reconciliation: the scheduler sends some of its non-terminal
+tasks and the master responds with the latest state for each task, if
+possible.
+* "Implicit" reconciliation: the scheduler sends an empty list of tasks
+and the master responds with the latest state for all currently known
+non-terminal tasks.
+
 **Tasks must be reconciled explicitly by the framework after a failure.**
 
 This is because the scheduler driver does not persist any task information.
@@ -50,6 +59,7 @@ framework.
 So, for now, let's look at how one needs to implement task state
 reconciliation in a framework scheduler.
 
+
 ### API
 
 Frameworks send a list of `TaskStatus` messages to the master:
@@ -73,25 +83,37 @@ slaves that are transitioning between states.
 
 ### Algorithm
 
-The technique for performing reconciliation should reconcile all non-terminal
-tasks, until an update is received for each task, using exponential backoff:
+This technique for explicit reconciliation reconciles all non-terminal tasks,
+until an update is received for each task, using exponential backoff to retry
+tasks that remain unreconciled. Retries are needed because the master 
temporarily
+may not be able to reply for a particular task. For example, during master
+failover the master must re-register all of the slaves to rebuild its
+set of known tasks (this process can take minutes for large clusters, and
+is bounded by the `--slave_reregister_timeout` flag on the master).
+
+Steps:
 
 1. let `start = now()`
 2. let `remaining = { T in tasks | T is non-terminal }`
 3. Perform reconciliation: `reconcile(remaining)`
 4. Wait for status updates to arrive (use truncated exponential backoff). For 
each update, note the time of arrival.
-5. let `remaining = { T in remaining | T.last_update_arrival() < start }`
+5. let `remaining = { T ϵ remaining | T.last_update_arrival() < start }`
 6. If `remaining` is non-empty, go to 3.
 
 This reconciliation algorithm **must** be run after each (re-)registration.
 
+Implicit reconciliation (passing an empty list) should also be used
+periodically, as a defense against data loss in the framework. Unless a
+strict registry is in use on the master, its possible for tasks to resurrect
+from a LOST state (without a strict registry the master does not enforce
+slave removal across failovers). When an unknown task is encountered, the
+scheduler should kill or recover the task.
+
 Notes:
 
 * When waiting for updates to arrive, **use a truncated exponential backoff**.
 This will avoid a snowball effect in the case of the driver or master being
 backed up.
-* Implicit reconciliation (passing an empty list) can also be used
-periodically, As a defense against data loss in the framework.
 * It is beneficial to ensure that only 1 reconciliation is in progress at a
 time, to avoid a snowball effect in the face of many re-registrations.
 If another reconciliation should be started while one is in-progress,

Reply via email to