[
https://issues.apache.org/jira/browse/AURORA-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137911#comment-15137911
]
Stephan Erb commented on AURORA-1500:
-------------------------------------
Relevant piece of code responsible for untracable deletes of PENDING tasks:
https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java#L442
(thanks Maxim for pointing out :-)
> Platform SLA gets stuck in DOWN when a replacement PENDING is killed
> --------------------------------------------------------------------
>
> Key: AURORA-1500
> URL: https://issues.apache.org/jira/browse/AURORA-1500
> Project: Aurora
> Issue Type: Bug
> Components: Scheduler
> Reporter: Maxim Khutornenko
>
> The way platform SLA calculation is currently done cannot account for some
> special cases when killed tasks don't leave any history behind. One example:
> a task gets LOST (SLA DOWN interval starts) and its replacement is scheduled
> immediately. If, however, the replacement task gets killed while still in
> PENDING, no history is left to close the DOWN interval and the platform SLA
> is degraded until either a new matching instance task is created by user or
> the task history is purged.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)