[ 
https://issues.apache.org/jira/browse/AURORA-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599788#comment-14599788
 ] 

Maxim Khutornenko commented on AURORA-1370:
-------------------------------------------

We would likely still need TaskTimeout to catch legitimate tasks stuck in 
ASSIGNED or KILLING but it would be prudent to consider making TaskTimeout a 
multistage filter. The first stage could be low enough (e.g. 1 minute) and 
result in calling for explicit reconciliation. If things don't improve (e.g. 
task stuck in KILLING) the TaskTimeout would trigger the second stage and 
schedule a replacement.

> reconsider the behavior of transient task states now that we have task 
> reconciliation
> -------------------------------------------------------------------------------------
>
>                 Key: AURORA-1370
>                 URL: https://issues.apache.org/jira/browse/AURORA-1370
>             Project: Aurora
>          Issue Type: Story
>          Components: Scheduler
>            Reporter: brian wickman
>
> Now that we have task reconciliation, it's less clear that transient task 
> states (e.g. KILLING) are necessary or should behave in the same way.  We 
> have discussed things like extending the escalation timeout for the 
> executor's HTTP lifecycle workflow (i.e. /quitquitquit, /abortabortabort) but 
> doing so would possibly conflict with transient task timeouts.  If we could 
> relax the transient task timeout through task reconciliation, then it may be 
> safer to do give more flexibility to the task.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to