[
https://issues.apache.org/jira/browse/AURORA-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Bill Farner resolved AURORA-144.
--------------------------------
Resolution: Won't Fix
I disagree with the premise of the ticket. The transient task timeout serves
as a last-ditch effort to make up for silent transient issues in surrounding
systems. If the scheduler is frequently encountering this timeout, there are
serious problems, and the whole cluster is in a perilous state.
I'm closing as won't fix, but please reopen if you don't buy my take on this
and would like to discuss further!
> Dynamic backoff to task timeout value
> -------------------------------------
>
> Key: AURORA-144
> URL: https://issues.apache.org/jira/browse/AURORA-144
> Project: Aurora
> Issue Type: Task
> Components: Reliability, Scheduler
> Reporter: Joe Smith
> Priority: Minor
>
> Although there is a command-line flag to set
> [transient_task_state_timeout|https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L74],
> once it's been set it is a fixed value and can only be tweaked via a manual
> flag change + redeploy.
> We may want to look into a way of incorporating task rescheduling feedback
> into the value we set for a timeout. There may also be a better approach than
> a timeout when we're already in a bad state.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)