[
https://issues.apache.org/jira/browse/AURORA-1370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14599788#comment-14599788
]
Maxim Khutornenko commented on AURORA-1370:
-------------------------------------------
We would likely still need TaskTimeout to catch legitimate tasks stuck in
ASSIGNED or KILLING but it would be prudent to consider making TaskTimeout a
multistage filter. The first stage could be low enough (e.g. 1 minute) and
result in calling for explicit reconciliation. If things don't improve (e.g.
task stuck in KILLING) the TaskTimeout would trigger the second stage and
schedule a replacement.
> reconsider the behavior of transient task states now that we have task
> reconciliation
> -------------------------------------------------------------------------------------
>
> Key: AURORA-1370
> URL: https://issues.apache.org/jira/browse/AURORA-1370
> Project: Aurora
> Issue Type: Story
> Components: Scheduler
> Reporter: brian wickman
>
> Now that we have task reconciliation, it's less clear that transient task
> states (e.g. KILLING) are necessary or should behave in the same way. We
> have discussed things like extending the escalation timeout for the
> executor's HTTP lifecycle workflow (i.e. /quitquitquit, /abortabortabort) but
> doing so would possibly conflict with transient task timeouts. If we could
> relax the transient task timeout through task reconciliation, then it may be
> safer to do give more flexibility to the task.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)