[ 
https://issues.apache.org/jira/browse/AURORA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286209#comment-14286209
 ] 

Bill Farner commented on AURORA-1041:
-------------------------------------

Do you propose this update strategy become the default and/or replace the 
existing strategies?

> Allow job uptime stats to control scheduler updater pace 
> ---------------------------------------------------------
>
>                 Key: AURORA-1041
>                 URL: https://issues.apache.org/jira/browse/AURORA-1041
>             Project: Aurora
>          Issue Type: Task
>          Components: Client, Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Maxim Khutornenko
>
> The current implementation of the scheduler updater relies on a user-defined 
> {{batch_size}} value to determine how many instances can be updated 
> simultaneously. While this approach is well understood and battle tested, it 
> comes with its own risks/inefficiencies:
> - No knowledge of job health outside of an active batch. Once an instance 
> graduates the {{watch_secs}} interval it's considered "healthy" and is never 
> looked at by the updater. Even if updated instances start flapping later, the 
> updater keeps on going;
> - The {{batch_size}} fixed value may artificially slow down the updater 
> progress as it's usually chosen conservatively as the max number of instances 
> a service can tolerate at any given moment and may not reflect the actual job 
> restart pace (see related AURORA-894).
> - Instances are evaluated/updated in a ordered fashion resulting in any new 
> instances coming up at the very end of an update sequence that both updates 
> the existing instances and adds new ones.
> The proposed solution will capitalize on the concept of *job uptime* 
> introduced in AURORA-290 and will allow scheduler updater to proceed as long 
> as the "X% of instances up over Y interval" job invariant is met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to