[ 
https://issues.apache.org/jira/browse/MESOS-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981420#comment-13981420
 ] 

Niklas Quarfot Nielsen commented on MESOS-938:
----------------------------------------------

Have been hacking on this on-off since I reported the issue. I should have more 
time to prioritize this after the external containerizer patches.

I'd like to start a thorough design discussion before suggesting any code 
(lesson learned).

I will suggest to split the proposed replaceTask into:

- replaceTask(TaskID oldId, TaskInfo newTask, bool rollback = false)
- resizeTask(TaskInfo runningTask, vector<Offer> auxiliaryOffers)
- Extend task launch with health check feature. TaskInfo will encode a repeated 
field of checks which range from builtin TCP/HTTP checks to scriptable checks. 
replaceTask will reuse this logic to implement rollback if new task failed.

If this sounds reasonable, I will start individual issues for the above and 
start detailed discussions around them alongside find interested 
contributors/committers to develop/shepherd these changes.

> Add replace task primitive
> --------------------------
>
>                 Key: MESOS-938
>                 URL: https://issues.apache.org/jira/browse/MESOS-938
>             Project: Mesos
>          Issue Type: Improvement
>          Components: c++ api, master, slave
>            Reporter: Niklas Quarfot Nielsen
>         Attachments: resource-flow.png, sequence.png
>
>
> A replaceTask primitive could allow easier up and down-scaling by reusing 
> resources from an old task either partially (leaving resource delta 
> available), fully or augmented (backed by additional offers) for launching a 
> new task. Further, recovery logic can be applied which restarts the old task 
> in case of new task failure.
> The signature could be something like:
> replaceTask(TaskID oldTaskId, TaskInfo newTaskInfo, vector<Offer> offers)
> A suggestion of the semantics of replaceTask could be as follows:
> 1) Framework issues replaceTask with the task id of an old running task, a 
> new task info and optionally a set of offers: replaceTask(oldTaskId, 
> newTaskInfo, offers)
> 2) The master validates offers and task (with respect to the resource 
> requirements section) by reusing offer and task visitors, and sends 
> replaceTask request to slave.
>      a) If new task violates resource requirements or consistency, TASK_LOST 
> is reported for the new task and entire request ends.
> 3) Slave stores and checkpoints request (tuple of old task id and new task 
> info), sends killTask to executor and starts timer. Auxiliary resources (from 
> the optional offers) are reserved before issuing the killTask.
>      a) If executor is reregistering, enqueue entire request, reserve 
> resources and start timer.
>      b) If executor is terminating or terminated, TASK_LOST is reported for 
> new task as old task resources cannot be reused.
>      c) If old task is unknown, it might be about to be reregistered by an 
> executor. Enqueue entire request, reserve resources and start timer.
>      d) If timer times out and a status update for terminating state e.g. 
> TASK_KILLED/TERMINATED has not been received, reserved resources are released 
> and send TASK_LOST for new task and replace request is removed. If executor 
> doesn’t report anything, this should be no different than an unresponsive 
> killTask().
> NOTE: Timers started at request start and during executor reregistration can 
> be merged.
> 4) Executor sends terminal status update for old task as usual response to 
> killTask(). However, the slave now intercepts the stage between releasing 
> resources and announcing change to isolator if old task id is marked to be 
> replaced.  In this stage, depending on whether new task consumes less than 
> the old task, unneeded resources are announced available to isolator.
> Thereafter, a launchTask request for the new task is sent to the executor. 
> Reserved resources are released (but not announced available) so they become 
> available for regular launchTask() request.
> 5) Start timer and await TASK_RUNNING update. If a terminal state update (or 
> no update) is received within time out a roll-back is attempted: If new task 
> use more or equal resources than old task, the old task should be able to be 
> restarted. If old task successfully restarted, a new TASK_ROLLEDBACK status 
> update rather then TASK_RUNNING should be reported to framework.
> Resource requirements:
> The resources required for the new task must be covered by:
> - Using less or same resources as old task: T_{new} <= T_{old}
> - Using more than old task resources, but less than sum of old resources and 
> aggregated offer resources: T_{new} <= T_{old} + O_{1} + … + O_{n}
> Any resources provided but not used by the new task are announced as 
> available.
> Fault tolerance (and open questions):
> - What happens if master fails after receiving a replaceTask request?
>      - Master does not store any state on replacements.
> - What happens if slave fails after receiving a replaceTask request?
>      - A slave, however, becomes aware of replacements in flight which either 
> requires 1) Checkpointing replacement requests to stable storage or 2) Let 
> executor be aware of replace request and let reregistration reestablish lost 
> requests.
> - What happens if slave fails while waiting for executor behavior?
>      - Waiting for executor to run
>      - Waiting for old task to reregister
>      - Killing old task while holding up resources for new task
>      - Just after receiving a terminal update for the old task.
>      - Monitoring new task startup
> - Reconciliation behavior
> This is can end up being a complicated operation in the slaves, but requires 
> virtually no change in executors. 
> An alternative approach could relieve slave of replace logic by pushing the 
> responsibility onto executors.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to