[
https://issues.apache.org/jira/browse/AURORA-1837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15862230#comment-15862230
]
Mehrdad Nurolahzade commented on AURORA-1837:
---------------------------------------------
The current implementation is also inefficient in the sense that it tries to
delete an expired task multiple times. The asynchronous nature of
{{BatchWorker}} which used to process task deletions introduces some delay
between delete enqueue and delete execution. As a result, tasks already deemed
deleted in a previous evaluation round might get picked up, evaluated and
enqueued for deletion multiple times (note that if a task is not found
{{deleteTasks.deleteTasks()}} does not fail).
This is evident in {{tasks_pruned}} metric which reflects numbers much higher
than the actual number of expired tasks deleted.
> Improve task history pruning
> ----------------------------
>
> Key: AURORA-1837
> URL: https://issues.apache.org/jira/browse/AURORA-1837
> Project: Aurora
> Issue Type: Task
> Reporter: Reza Motamedi
> Assignee: Mehrdad Nurolahzade
> Priority: Minor
> Labels: scheduler
>
> Current implementation of {{TaskHistoryPrunner}} registers all inactive tasks
> upon terminal _state_ change for pruning.
> {{TaskHistoryPrunner::registerInactiveTask()}} uses a delay executor to
> schedule the process of pruning _task_s. However, we have noticed most of
> pruning takes place after scheduler recovers from a fail-over.
> Modify {{TaskHistoryPruner}} to a design similar to
> {{JobUpdateHistoryPruner}}:
> # Instead of registering delay executor's upon terminal task state
> transitions, have it wake up on preconfigured intervals, find all terminal
> state tasks that meet pruning criteria and delete them.
> # Make the initial task history pruning delay configurable so that it does
> not hamper scheduler upon start.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)