[
https://issues.apache.org/jira/browse/AURORA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16035170#comment-16035170
]
Kai Huang commented on AURORA-1929:
-----------------------------------
https://reviews.apache.org/r/59699/
> Improve explicit task history pruning.
> --------------------------------------
>
> Key: AURORA-1929
> URL: https://issues.apache.org/jira/browse/AURORA-1929
> Project: Aurora
> Issue Type: Task
> Components: Scheduler
> Reporter: Kai Huang
> Assignee: Kai Huang
> Priority: Minor
>
> There are currently two types of task history pruning running by aurora:
> # The implicit task history pruning running by TaskHistoryPrunner in the
> background, which registers all inactive tasks upon terminal state change for
> pruning.
> # The explicit task history pruning initiated by `aurora_admin prune_tasks`
> command, which prunes inactive tasks in the cluster.
> The prune_tasks endpoint seems to be very slow when the cluster has a large
> number of inactive tasks.
> For example, when we use $ aurora_admin prune_tasks for 135k running tasks
> (1k jobs), it takes about ~30 minutes to prune all tasks, the pruning speed
> seems to max out at 3k tasks per minute.
> Currently, aurora uses StreamManager to manages a single log stream append
> transaction for task history pruning. Local storage ops can be added to the
> transaction and then later committed as an atomic unit. However, the
> StateManager removes tasks one by one in a
> for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376),
> and each RemoveTasks operation is coalesced with its previous operation,
> which seems inefficient and unnecessary
> (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324).
> We need to batch all removeTasks operations and execute them all at once to
> avoid additional cost of coalescing. The fix will also benefit implicit task
> history pruning since it has similar underlying implementation.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)