[ 
https://issues.apache.org/jira/browse/AURORA-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bill Farner resolved AURORA-144.
--------------------------------

    Resolution: Won't Fix

I disagree with the premise of the ticket.  The transient task timeout serves  
as a last-ditch effort to make up for silent transient issues in surrounding 
systems.  If the scheduler is frequently encountering this timeout, there are 
serious problems, and the whole cluster is in a perilous state.

I'm closing as won't fix, but please reopen if you don't buy my take on this 
and would like to discuss further!

> Dynamic backoff to task timeout value
> -------------------------------------
>
>                 Key: AURORA-144
>                 URL: https://issues.apache.org/jira/browse/AURORA-144
>             Project: Aurora
>          Issue Type: Task
>          Components: Reliability, Scheduler
>            Reporter: Joe Smith
>            Priority: Minor
>
> Although there is a command-line flag to set 
> [transient_task_state_timeout|https://github.com/apache/incubator-aurora/blob/master/src/main/java/org/apache/aurora/scheduler/async/AsyncModule.java#L74],
>  once it's been set it is a fixed value and can only be tweaked via a manual 
> flag change + redeploy.
> We may want to look into a way of incorporating task rescheduling feedback 
> into the value we set for a timeout. There may also be a better approach than 
> a timeout when we're already in a bad state.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to