[jira] [Commented] (AURORA-1041) Allow job uptime stats to control scheduler updater pace

Maxim Khutornenko (JIRA) Wed, 21 Jan 2015 12:41:12 -0800

    [ 
https://issues.apache.org/jira/browse/AURORA-1041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286219#comment-14286219
 ]


Maxim Khutornenko commented on AURORA-1041:
-------------------------------------------

https://reviews.apache.org/r/29943/

> Allow job uptime stats to control scheduler updater pace 
> ---------------------------------------------------------
>
>                 Key: AURORA-1041
>                 URL: https://issues.apache.org/jira/browse/AURORA-1041
>             Project: Aurora
>          Issue Type: Task
>          Components: Client, Scheduler
>            Reporter: Maxim Khutornenko
>            Assignee: Maxim Khutornenko
>
> The current implementation of the scheduler updater relies on a user-defined 
> {{batch_size}} value to determine how many instances can be updated 
> simultaneously. While this approach is well understood and battle tested, it 
> comes with its own risks/inefficiencies:
> - No knowledge of job health outside of an active batch. Once an instance 
> graduates the {{watch_secs}} interval it's considered "healthy" and is never 
> looked at by the updater. Even if updated instances start flapping later, the 
> updater keeps on going;
> - The {{batch_size}} fixed value may artificially slow down the updater 
> progress as it's usually chosen conservatively as the max number of instances 
> a service can tolerate at any given moment and may not reflect the actual job 
> restart pace (see related AURORA-894).
> - Instances are evaluated/updated in a ordered fashion resulting in any new 
> instances coming up at the very end of an update sequence that both updates 
> the existing instances and adds new ones.
> The proposed solution will capitalize on the concept of *job uptime* 
> introduced in AURORA-290 and will allow scheduler updater to proceed as long 
> as the "X% of instances up over Y interval" job invariant is met.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (AURORA-1041) Allow job uptime stats to control scheduler updater pace

Reply via email to