Added: aurora/site/source/documentation/latest/reference/task-lifecycle.md URL: http://svn.apache.org/viewvc/aurora/site/source/documentation/latest/reference/task-lifecycle.md?rev=1739402&view=auto ============================================================================== --- aurora/site/source/documentation/latest/reference/task-lifecycle.md (added) +++ aurora/site/source/documentation/latest/reference/task-lifecycle.md Sat Apr 16 04:23:06 2016 @@ -0,0 +1,146 @@ +# Task Lifecycle + +When Aurora reads a configuration file and finds a `Job` definition, it: + +1. Evaluates the `Job` definition. +2. Splits the `Job` into its constituent `Task`s. +3. Sends those `Task`s to the scheduler. +4. The scheduler puts the `Task`s into `PENDING` state, starting each + `Task`'s life cycle. + + + + +Please note, a couple of task states described below are missing from +this state diagram. + + +## PENDING to RUNNING states + +When a `Task` is in the `PENDING` state, the scheduler constantly +searches for machines satisfying that `Task`'s resource request +requirements (RAM, disk space, CPU time) while maintaining configuration +constraints such as "a `Task` must run on machines dedicated to a +particular role" or attribute limit constraints such as "at most 2 +`Task`s from the same `Job` may run on each rack". When the scheduler +finds a suitable match, it assigns the `Task` to a machine and puts the +`Task` into the `ASSIGNED` state. + +From the `ASSIGNED` state, the scheduler sends an RPC to the slave +machine containing `Task` configuration, which the slave uses to spawn +an executor responsible for the `Task`'s lifecycle. When the scheduler +receives an acknowledgment that the machine has accepted the `Task`, +the `Task` goes into `STARTING` state. + +`STARTING` state initializes a `Task` sandbox. When the sandbox is fully +initialized, Thermos begins to invoke `Process`es. Also, the slave +machine sends an update to the scheduler that the `Task` is +in `RUNNING` state. + + + +## RUNNING to terminal states + +There are various ways that an active `Task` can transition into a terminal +state. By definition, it can never leave this state. However, depending on +nature of the termination and the originating `Job` definition +(e.g. `service`, `max_task_failures`), a replacement `Task` might be +scheduled. + +### Natural Termination: FINISHED, FAILED + +A `RUNNING` `Task` can terminate without direct user interaction. For +example, it may be a finite computation that finishes, even something as +simple as `echo hello world.`, or it could be an exceptional condition in +a long-lived service. If the `Task` is successful (its underlying +processes have succeeded with exit status `0` or finished without +reaching failure limits) it moves into `FINISHED` state. If it finished +after reaching a set of failure limits, it goes into `FAILED` state. + +A terminated `TASK` which is subject to rescheduling will be temporarily +`THROTTLED`, if it is considered to be flapping. A task is flapping, if its +previous invocation was terminated after less than 5 minutes (scheduler +default). The time penalty a task has to remain in the `THROTTLED` state, +before it is eligible for rescheduling, increases with each consecutive +failure. + +### Forceful Termination: KILLING, RESTARTING + +You can terminate a `Task` by issuing an `aurora job kill` command, which +moves it into `KILLING` state. The scheduler then sends the slave a +request to terminate the `Task`. If the scheduler receives a successful +response, it moves the Task into `KILLED` state and never restarts it. + +If a `Task` is forced into the `RESTARTING` state via the `aurora job restart` +command, the scheduler kills the underlying task but in parallel schedules +an identical replacement for it. + +In any case, the responsible executor on the slave follows an escalation +sequence when killing a running task: + + 1. If a `HttpLifecycleConfig` is not present, skip to (4). + 2. Send a POST to the `graceful_shutdown_endpoint` and wait 5 seconds. + 3. Send a POST to the `shutdown_endpoint` and wait 5 seconds. + 4. Send SIGTERM (`kill`) and wait at most `finalization_wait` seconds. + 5. Send SIGKILL (`kill -9`). + +If the executor notices that all `Process`es in a `Task` have aborted +during this sequence, it will not proceed with subsequent steps. +Note that graceful shutdown is best-effort, and due to the many +inevitable realities of distributed systems, it may not be performed. + +### Unexpected Termination: LOST + +If a `Task` stays in a transient task state for too long (such as `ASSIGNED` +or `STARTING`), the scheduler forces it into `LOST` state, creating a new +`Task` in its place that's sent into `PENDING` state. + +In addition, if the Mesos core tells the scheduler that a slave has +become unhealthy (or outright disappeared), the `Task`s assigned to that +slave go into `LOST` state and new `Task`s are created in their place. +From `PENDING` state, there is no guarantee a `Task` will be reassigned +to the same machine unless job constraints explicitly force it there. + +### Giving Priority to Production Tasks: PREEMPTING + +Sometimes a Task needs to be interrupted, such as when a non-production +Task's resources are needed by a higher priority production Task. This +type of interruption is called a *pre-emption*. When this happens in +Aurora, the non-production Task is killed and moved into +the `PREEMPTING` state when both the following are true: + +- The task being killed is a non-production task. +- The other task is a `PENDING` production task that hasn't been + scheduled due to a lack of resources. + +The scheduler UI shows the non-production task was preempted in favor of +the production task. At some point, tasks in `PREEMPTING` move to `KILLED`. + +Note that non-production tasks consuming many resources are likely to be +preempted in favor of production tasks. + +### Making Room for Maintenance: DRAINING + +Cluster operators can set slave into maintenance mode. This will transition +all `Task` running on this slave into `DRAINING` and eventually to `KILLED`. +Drained `Task`s will be restarted on other slaves for which no maintenance +has been announced yet. + + + +## State Reconciliation + +Due to the many inevitable realities of distributed systems, there might +be a mismatch of perceived and actual cluster state (e.g. a machine returns +from a `netsplit` but the scheduler has already marked all its `Task`s as +`LOST` and rescheduled them). + +Aurora regularly runs a state reconciliation process in order to detect +and correct such issues (e.g. by killing the errant `RUNNING` tasks). +By default, the proper detection of all failure scenarios and inconsistencies +may take up to an hour. + +To emphasize this point: there is no uniqueness guarantee for a single +instance of a job in the presence of network partitions. If the `Task` +requires that, it should be baked in at the application level using a +distributed coordination service such as Zookeeper.
