Maxim Khutornenko created AURORA-690:
----------------------------------------

             Summary: Add support for external update coordination
                 Key: AURORA-690
                 URL: https://issues.apache.org/jira/browse/AURORA-690
             Project: Aurora
          Issue Type: Story
          Components: Client, Scheduler
            Reporter: Maxim Khutornenko


With the introduction of scheduler-driven job update orchestration (AURORA-610) 
it will be a bit harder for a user to interrupt a job update process went wrong 
(i.e. bad binary, incorrect settings, changed external conditions and etc.). 
Instead of aborting the update process via CTRL-C (client updater) users would 
have to run abort/pause command that risk to never reach scheduler in case of 
client network partitioning. 

To compensate the above, it would be great for the scheduler to optionally 
support an inverted dependency model where the updater would willingly pause 
job update progress upon reaching certain checkpoints and wait for the 
client/external service to explicitly "ack" on it (i.e. resumeJobUpdate RPC). 
Such checkpoints could be:
- predefined number of instances reached
- percentage of completion
- time-based heartbeat (HB) intervals

Arguably, the time-based HB approach should be the most versatile addressing 
the majority case.

Generalizing further, this feature would be useful for building external update 
coordination services where Aurora service job upgrades are controlled by 
application specific health tracking systems throttling individual job updates 
based on the internal health/traffic metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to