[ 
https://issues.apache.org/jira/browse/AURORA-1929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Kumar Shanmugham updated AURORA-1929:
----------------------------------------------
    Fix Version/s: 0.18.0

> Improve explicit task history pruning.
> --------------------------------------
>
>                 Key: AURORA-1929
>                 URL: https://issues.apache.org/jira/browse/AURORA-1929
>             Project: Aurora
>          Issue Type: Task
>          Components: Scheduler
>            Reporter: Kai Huang
>            Assignee: Kai Huang
>            Priority: Minor
>             Fix For: 0.18.0
>
>
> There are currently two types of task history pruning running by aurora:
> # The implicit task history pruning running by TaskHistoryPrunner in the 
> background, which registers all inactive tasks upon terminal state change for 
> pruning.
> # The explicit task history pruning initiated by `aurora_admin prune_tasks` 
> command, which prunes inactive tasks in the cluster.
> The prune_tasks endpoint seems to be very slow when the cluster has a large 
> number of inactive tasks. 
> For example, when we use $ aurora_admin prune_tasks for 135k running tasks 
> (1k jobs), it takes about ~30 minutes to prune all tasks, the pruning speed 
> seems to max out at 3k tasks per minute.
> Currently, aurora uses StreamManager to manages a single log stream append 
> transaction for task history pruning. Local storage ops can be added to the 
> transaction and then later committed as an atomic unit. However, the 
> StateManager removes tasks one by one in a 
> for-loop(https://github.com/apache/aurora/blob/master/src/main/java/org/apache/aurora/scheduler/state/StateManagerImpl.java#L376),
>  and each RemoveTasks operation is coalesced with its previous operation, 
> which seems inefficient and unnecessary 
> (https://github.com/apache/aurora/blob/c85bffdd6f68312261697eee868d57069adda434/src/main/java/org/apache/aurora/scheduler/storage/log/StreamManagerImpl.java#L324).
> We need to batch all removeTasks operations and execute them all at once to 
> avoid additional cost of coalescing. The fix will also benefit implicit task 
> history pruning since it has similar underlying implementation.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to