Maxim Khutornenko created AURORA-1240:
-----------------------------------------
Summary: Deprecate UpdateConfig "restart_threshold" setting
Key: AURORA-1240
URL: https://issues.apache.org/jira/browse/AURORA-1240
Project: Aurora
Issue Type: Task
Components: Client, Scheduler
Reporter: Maxim Khutornenko
The UpdateConfig {{restart_theshold}} \[1\] setting does not appear to deliver
much user value as it's highly sensitive to scheduling performance and may
result in aborted/rolled back job updates when set too low.
Some background: This timeout controls task transition from {{PENDING}} to
{{RUNNING}} during the job update. In the event of cluster capacity shortage,
assigning a task to a host may take considerably longer thus expiring the
timeout and depending on the failure settings causing an unnecessary job update
abort or rollback. It was meant to give users some protection against
unsatisfiable resource/constraint requirements. In reality though, it proved to
be rather an annoyance to users when an update is interrupted due to unexpected
delay in task assignment.
Consider deprecating and subsequently removing this setting.
\[1\] -
https://github.com/apache/aurora/blob/master/docs/configuration-reference.md#updateconfig-objects
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)