[ 
https://issues.apache.org/jira/browse/AURORA-1721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15400116#comment-15400116
 ] 

David McLaughlin commented on AURORA-1721:
------------------------------------------

I am also +1 to (1). I like the idea of explicitly being able to force the 
update into ROLLING_BACK while it is in progress.

I'm -1 to (2). A valid state transition is a user explicitly pausing the 
update. Should we block that until a pulse? It just adds needless complexity to 
the product and UX.  

If you have something external that knows how to send a pulse to verify 
everything is ok, can't that thing also keep track of previous state in order 
to rollback? In fact I'd say longer term you will eventually run into the 
situation when someone comes to you and says they need to rollback after 
several hours (or even days) of being in production, long after you decided to 
mark the updates as ROLLED_FORWARD. This is what influenced our solution at 
Twitter (to save JobUpdateRequests and allow users to replay them). 

You may have concerns about - how do we get a previous state in order to call 
the Aurora API? A JobUpdateRequest to represent the previous state can easily 
be constructed from the JobUpdateDetails. The JobUpdateRequest has only three 
parameters - a TaskConfig, instance count and the JobUpdateSettings.

{code}
struct JobUpdateRequest {
  /** Desired TaskConfig to apply. */
  1: TaskConfig taskConfig

  /** Desired number of instances of the task config. */
  2: i32 instanceCount

  /** Update settings and limits. */
  3: JobUpdateSettings settings
}
{code}

To get these values from a JobUpdateDetails:

{code}
struct JobUpdateDetails {
  /** Update definition. */
  1: JobUpdate update
  ...
}

/** Full definition of the job update. */
struct JobUpdate {
   ...
  /** Update configuration. */
  2: JobUpdateInstructions instructions
}

struct JobUpdateInstructions {
  /** Actual InstanceId -> TaskConfig mapping when the update was requested. */
  1: set<InstanceTaskConfig> initialState

  /** Update specific settings. */
  3: JobUpdateSettings settings
}

struct InstanceTaskConfig {
  /** A TaskConfig associated with instances. */
  1: TaskConfig task

  /** Instances associated with the TaskConfig. */
  2: set<Range> instances
}
{code}

So using these structs, a rollback algorithm based on JobUpdateKey could be:

{code}
JobUpdateDetails oldUpdate = client.getJobUpdateDetails(new JobUpdateKey(...));
for (InstanceTaskConfig iConfig: oldUpdate.update.instructions.initialState) {
  JobUpdateRequest rollbackRequest = new JobUpdateRequest(
     iConfig.task,
    calculateInstanceCount(iConfig.instances),
     oldUpdate.update.instructions.settings
  );
  client.startJobUpdate(rollbackRequest);
}
{code}

In practice though, it might be easier for you to just persist JobUpdateRequest 
instances from the client (we added a custom noun in our client to generate the 
JobUpdateRequest from the Aurora DSL, can discuss if you want) and store them 
for rollbacks. 

Thoughts?

> Support user initiated rollback 
> --------------------------------
>
>                 Key: AURORA-1721
>                 URL: https://issues.apache.org/jira/browse/AURORA-1721
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Igor Morozov
>            Assignee: Igor Morozov
>              Labels: Uber
>             Fix For: 0.16.0
>
>
> The proposal to support user initiated rollback:
> 1. Create new thrift API:
>  /**Rollback job update. */
>   Response rollbackJobUpdate(
>       /** The update to rollback. */
>       1: JobUpdateKey key,
>       /** A user-specified message to include with the induced job update 
> state change. */
>       3: string message)
> 2.  Implement new API in a scheduler so the implementation would just undo 
> the latest JobUpdate effectively trying to apply initialState to the job. If 
> that is for some reason is impossible them rollback with fail with 
> appropriate error message.
> 3. Support new aurora client command 'rollback'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to