[
https://issues.apache.org/jira/browse/AURORA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Lambert updated AURORA-690:
---------------------------------
Sprint: Aurora Q4 Sprint 1, Aurora Q4 Sprint 2 (was: Aurora Q4 Sprint 1)
> Add support for external update coordination
> --------------------------------------------
>
> Key: AURORA-690
> URL: https://issues.apache.org/jira/browse/AURORA-690
> Project: Aurora
> Issue Type: Story
> Components: Client, Scheduler
> Reporter: Maxim Khutornenko
> Assignee: Maxim Khutornenko
> Priority: Critical
>
> With the introduction of scheduler-driven job update orchestration
> (AURORA-610) it will be a bit harder for a user to interrupt a job update
> process went wrong (i.e. bad binary, incorrect settings, changed external
> conditions and etc.). Instead of aborting the update process via CTRL-C
> (client updater) users would have to run abort/pause command that risk to
> never reach scheduler in case of client network partitioning.
> To compensate the above, it would be great for the scheduler to optionally
> support an inverted dependency model where the updater would willingly pause
> job update progress upon reaching certain checkpoints and wait for the
> client/external service to explicitly "ack" on it (i.e. resumeJobUpdate RPC).
> Such checkpoints could be:
> - predefined number of instances reached
> - percentage of completion
> - time-based heartbeat (HB) intervals
> Arguably, the time-based HB approach should be the most versatile addressing
> the majority case.
> Generalizing further, this feature would be useful for building external
> update coordination services where Aurora service job upgrades are controlled
> by application specific health tracking systems throttling individual job
> updates based on the internal health/traffic metrics.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)