[
https://issues.apache.org/jira/browse/AURORA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400186#comment-15400186
]
Igor Morozov commented on AURORA-1721:
--------------------------------------
we can't create new job update for rollback, at least not easily as our
infrastructure has been built around concept of rollbackable workflows. By
creating new job update for rollback we're going to mutate an update workflow
state something we don't want to do for variety of different reasons.
pausing an update in ROLL_FORWARD_AWAITING_PULSE (or any new state for that
matter) before going to ROLLED_FORWARD state is just a way to implement a two
phase commit for distributed coordinated update.
This is what we want to achieve with this change:
Coordinator starts an upgrade:
dc1: -> starting update1 for job1
dc2: -> staring update2 for job2
----
Coordinator:
dc1: update1 is done, enters paused state
dc2: update2 has failed, rolling back
----
Coordinator:
dc1: starts rolling back update 1
dc2: update 2 is rolled back
----
Coordinator:
dc1: update 1 is rolled back
dc2: update 2 is rolled back
Without entering an intermediate state for update job we would need to create a
new update as you suggested to rollback thus mutating the state of distributed
workflow from (update1, update2) to (update3, update2)
If somebody wants to rollback hours after upgrade is done they would need to
roll forward with the previous version (logical rollback)
The use case we're targeting is supporting fast rollbacks for distributed
updates.
> Support user initiated rollback
> --------------------------------
>
> Key: AURORA-1721
> URL: https://issues.apache.org/jira/browse/AURORA-1721
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Igor Morozov
> Assignee: Igor Morozov
> Labels: Uber
> Fix For: 0.16.0
>
>
> The proposal to support user initiated rollback:
> 1. Create new thrift API:
> /**Rollback job update. */
> Response rollbackJobUpdate(
> /** The update to rollback. */
> 1: JobUpdateKey key,
> /** A user-specified message to include with the induced job update
> state change. */
> 3: string message)
> 2. Implement new API in a scheduler so the implementation would just undo
> the latest JobUpdate effectively trying to apply initialState to the job. If
> that is for some reason is impossible them rollback with fail with
> appropriate error message.
> 3. Support new aurora client command 'rollback'
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)