[ 
https://issues.apache.org/jira/browse/AURORA-690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14277327#comment-14277327
 ] 

Maxim Khutornenko commented on AURORA-690:
------------------------------------------

Proposed high level design flow: 
https://github.com/maxim111333/incubator-aurora/blob/hb_doc/docs/update-heartbeat.md

> Add support for external update coordination
> --------------------------------------------
>
>                 Key: AURORA-690
>                 URL: https://issues.apache.org/jira/browse/AURORA-690
>             Project: Aurora
>          Issue Type: Epic
>          Components: Client, Scheduler
>            Reporter: Maxim Khutornenko
>            Priority: Critical
>              Labels: twitter
>
> With the introduction of scheduler-driven job update orchestration 
> (AURORA-610) it will be a bit harder for a user to interrupt a job update 
> process went wrong (i.e. bad binary, incorrect settings, changed external 
> conditions and etc.). Instead of aborting the update process via CTRL-C 
> (client updater) users would have to run abort/pause command that risk to 
> never reach scheduler in case of client network partitioning. 
> To compensate the above, it would be great for the scheduler to optionally 
> support an inverted dependency model where the updater would willingly pause 
> job update progress upon reaching certain checkpoints and wait for the 
> client/external service to explicitly "ack" on it (i.e. resumeJobUpdate RPC). 
> Such checkpoints could be:
> - predefined number of instances reached
> - percentage of completion
> - time-based heartbeat (HB) intervals
> Arguably, the time-based HB approach should be the most versatile addressing 
> the majority case.
> Generalizing further, this feature would be useful for building external 
> update coordination services where Aurora service job upgrades are controlled 
> by application specific health tracking systems throttling individual job 
> updates based on the internal health/traffic metrics.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to