Repository: aurora Updated Branches: refs/heads/master 2026ca040 -> 6cb2d4f69
Bump initial_task_kill_retry_interval to 15s. It is not very common that kills are dropped by Mesos and have to be retried by Aurora. It therefore makes sense to slightly increase the retry timeout so that we don't retry needlessly when Thermos is still busy executing the lifecycle methods. By default, Thermos uses the following kill escalation sequence: * /quitquitquit * wait 5s * /abortabortabort * wait 5s * SIGTERM * wait up to 1 minute * SIGKILL Reviewed at https://reviews.apache.org/r/58611/ Project: http://git-wip-us.apache.org/repos/asf/aurora/repo Commit: http://git-wip-us.apache.org/repos/asf/aurora/commit/6cb2d4f6 Tree: http://git-wip-us.apache.org/repos/asf/aurora/tree/6cb2d4f6 Diff: http://git-wip-us.apache.org/repos/asf/aurora/diff/6cb2d4f6 Branch: refs/heads/master Commit: 6cb2d4f698a75edf15d3688b4b39e2e6e7467fdd Parents: 2026ca0 Author: Stephan Erb <[email protected]> Authored: Tue Apr 25 23:18:30 2017 +0200 Committer: Stephan Erb <[email protected]> Committed: Tue Apr 25 23:18:30 2017 +0200 ---------------------------------------------------------------------- .../aurora/scheduler/reconciliation/ReconciliationModule.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/aurora/blob/6cb2d4f6/src/main/java/org/apache/aurora/scheduler/reconciliation/ReconciliationModule.java ---------------------------------------------------------------------- diff --git a/src/main/java/org/apache/aurora/scheduler/reconciliation/ReconciliationModule.java b/src/main/java/org/apache/aurora/scheduler/reconciliation/ReconciliationModule.java index e076e80..80fc616 100644 --- a/src/main/java/org/apache/aurora/scheduler/reconciliation/ReconciliationModule.java +++ b/src/main/java/org/apache/aurora/scheduler/reconciliation/ReconciliationModule.java @@ -59,7 +59,7 @@ public class ReconciliationModule extends AbstractModule { help = "When killing a task, retry after this delay if mesos has not responded," + " backing off up to transient_task_state_timeout") private static final Arg<Amount<Long, Time>> INITIAL_TASK_KILL_RETRY_INTERVAL = - Arg.create(Amount.of(5L, Time.SECONDS)); + Arg.create(Amount.of(15L, Time.SECONDS)); // Reconciliation may create a big surge of status updates in a large cluster. Setting the default // initial delay to 1 minute to ease up storage contention during scheduler start up.
