Repository: mesos Updated Branches: refs/heads/master fe02e02cf -> c4fabadba
Improved task reconciliation documentation. Project: http://git-wip-us.apache.org/repos/asf/mesos/repo Commit: http://git-wip-us.apache.org/repos/asf/mesos/commit/c4fabadb Tree: http://git-wip-us.apache.org/repos/asf/mesos/tree/c4fabadb Diff: http://git-wip-us.apache.org/repos/asf/mesos/diff/c4fabadb Branch: refs/heads/master Commit: c4fabadbaa9e9bc98f04b639936fbbb49346bf25 Parents: fe02e02 Author: Benjamin Mahler <[email protected]> Authored: Mon Jul 27 10:55:39 2015 -0700 Committer: Benjamin Mahler <[email protected]> Committed: Mon Jul 27 11:34:17 2015 -0700 ---------------------------------------------------------------------- docs/reconciliation.md | 32 +++++++++++++++++++++++++++----- 1 file changed, 27 insertions(+), 5 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/mesos/blob/c4fabadb/docs/reconciliation.md ---------------------------------------------------------------------- diff --git a/docs/reconciliation.md b/docs/reconciliation.md index 89ddf5b..b50d692 100644 --- a/docs/reconciliation.md +++ b/docs/reconciliation.md @@ -40,6 +40,15 @@ task state reconciliation. ## Task Reconciliation +Mesos provides two forms of reconciliation: + +* "Explicit" reconciliation: the scheduler sends some of its non-terminal +tasks and the master responds with the latest state for each task, if +possible. +* "Implicit" reconciliation: the scheduler sends an empty list of tasks +and the master responds with the latest state for all currently known +non-terminal tasks. + **Tasks must be reconciled explicitly by the framework after a failure.** This is because the scheduler driver does not persist any task information. @@ -50,6 +59,7 @@ framework. So, for now, let's look at how one needs to implement task state reconciliation in a framework scheduler. + ### API Frameworks send a list of `TaskStatus` messages to the master: @@ -73,25 +83,37 @@ slaves that are transitioning between states. ### Algorithm -The technique for performing reconciliation should reconcile all non-terminal -tasks, until an update is received for each task, using exponential backoff: +This technique for explicit reconciliation reconciles all non-terminal tasks, +until an update is received for each task, using exponential backoff to retry +tasks that remain unreconciled. Retries are needed because the master temporarily +may not be able to reply for a particular task. For example, during master +failover the master must re-register all of the slaves to rebuild its +set of known tasks (this process can take minutes for large clusters, and +is bounded by the `--slave_reregister_timeout` flag on the master). + +Steps: 1. let `start = now()` 2. let `remaining = { T in tasks | T is non-terminal }` 3. Perform reconciliation: `reconcile(remaining)` 4. Wait for status updates to arrive (use truncated exponential backoff). For each update, note the time of arrival. -5. let `remaining = { T in remaining | T.last_update_arrival() < start }` +5. let `remaining = { T ϵ remaining | T.last_update_arrival() < start }` 6. If `remaining` is non-empty, go to 3. This reconciliation algorithm **must** be run after each (re-)registration. +Implicit reconciliation (passing an empty list) should also be used +periodically, as a defense against data loss in the framework. Unless a +strict registry is in use on the master, its possible for tasks to resurrect +from a LOST state (without a strict registry the master does not enforce +slave removal across failovers). When an unknown task is encountered, the +scheduler should kill or recover the task. + Notes: * When waiting for updates to arrive, **use a truncated exponential backoff**. This will avoid a snowball effect in the case of the driver or master being backed up. -* Implicit reconciliation (passing an empty list) can also be used -periodically, As a defense against data loss in the framework. * It is beneficial to ensure that only 1 reconciliation is in progress at a time, to avoid a snowball effect in the face of many re-registrations. If another reconciliation should be started while one is in-progress,
