[
https://issues.apache.org/jira/browse/AURORA-1749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Igor Morozov updated AURORA-1749:
---------------------------------
Assignee: Igor Morozov
> Get a support for distributed job update coordination
> -----------------------------------------------------
>
> Key: AURORA-1749
> URL: https://issues.apache.org/jira/browse/AURORA-1749
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: Igor Morozov
> Assignee: Igor Morozov
>
> This is for a use case to update jobs that are the same but spread across
> multiple datacenters and managed by different aurora clusters.
> For example we have a service job test-service that runs in two datacenters
> dc1 and dc1.
> Logically the job needs to be updated in a single lock step across multiple
> data centers and if any job update fails and goes into ROLLING_BACK state
> all the others need to start a rollback as well.
>
> This is what we want to achieve with this change:
> 1. Coordinator starts an upgrade:
> dc1: -> starting update1 for job1
> dc2: -> staring update2 for job2
> 2. Coordinator:
> dc1: update1 is done, enters paused state
> dc2: update2 has failed, rolling back
> 3. Coordinator:
> dc1: starts rolling back update 1
> dc2: update 2 is rolled back
> 4. Coordinator:
> dc1: update 1 is rolled back
> dc2: update 2 is rolled back
> Currently step 2 is impossible in aurora as job update enters the terminal
> state and could not be rolled back from it.
> There was some discussion in AURORA-1721 ticket regarding using another job
> update to roll forward the job to a previous version effectively simulating a
> rollback. But now in order to reconcile the state of an actual update
> operation one would need to consider two or more update jobs and
> differentiate between normal ROLLED_FORWARD vs ROLLED_FORWARD(rollback) jobs.
> That feels quite artificial error prone. We believe an ability to run a
> coordinate job update across multiple data centers should be a first class
> citizen in aurora
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)