[ https://issues.apache.org/jira/browse/AURORA-121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Bill Farner updated AURORA-121: ------------------------------- Description: When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls back to the preemptor: {code} if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) { // Task could not be scheduled. maybePreemptFor(taskId); return TaskSchedulerResult.TRY_AGAIN; } {code} This can be problematic when the task store is large (O(10k tasks)) and there is a steady supply of PENDING tasks not satisfied by open slots. This will manifest as an overall degraded/slow scheduler, and logs of slow queries used for preemption: {noformat} I0125 17:47:36.970 THREAD23 org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took 107 ms: TaskQuery(owner:null, environment:null, jobName:null, taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING], slaveHost:null, instanceIds:null) {noformat} Several approaches come to mind to improve this situation (not mutually exclusive): - (easy) More aggressively back off on tasks that cannot be satisfied - (easy) Fall back to preemption less frequently - (easy) Gather the list of slaves from {{AttributeStore}} rather than {{TaskStore}}. This breaks the operation up into many smaller queries and reduces the amount of work in cases where a match is found. However, this would actually create more work when a match is not found, so this approach is probably not helpful by itself. - (harder) Scan for preemption candidates asynchronously, freeing up the TaskScheduler thread and the storage write lock. Scans could be kicked off by the task scheduler, ideally in a way that doesn't dogpile. This could also be done in a weakly-consistent way to minimally contribute to storage contention. was: When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls back to the preemptor: {code} if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) { // Task could not be scheduled. maybePreemptFor(taskId); return TaskSchedulerResult.TRY_AGAIN; } {code} This can be problematic when the task store is large (O(10k tasks)) and there is a steady supply of PENDING tasks not satisfied by open slots. This will manifest as an overall degraded/slow scheduler, and logs of slow queries used for preemption: {noformat} I0125 17:47:36.970 THREAD23 org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took 107 ms: TaskQuery(owner:null, environment:null, jobName:null, taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING], slaveHost:null, instanceIds:null) {noformat} Several approaches come to mind to improve this situation: - (easy) More aggressively back off on tasks that cannot be satisfied - (easy) Fall back to preemption less frequently - (harder) Scan for preemption candidates asynchronously, freeing up the TaskScheduler thread and the storage write lock. Scans could be kicked off by the task scheduler, ideally in a way that doesn't dogpile. This could also be done in a weakly-consistent way to minimally contribute to storage contention. > Make the preemptor more efficient > --------------------------------- > > Key: AURORA-121 > URL: https://issues.apache.org/jira/browse/AURORA-121 > Project: Aurora > Issue Type: Story > Components: Scheduler > Reporter: Bill Farner > > When {{TaskSchedulerImpl}} fails to find an open slot for a task, it falls > back to the preemptor: > {code} > if (!offerQueue.launchFirst(getAssignerFunction(taskId, task))) { > // Task could not be scheduled. > maybePreemptFor(taskId); > return TaskSchedulerResult.TRY_AGAIN; > } > {code} > This can be problematic when the task store is large (O(10k tasks)) and there > is a steady supply of PENDING tasks not satisfied by open slots. This will > manifest as an overall degraded/slow scheduler, and logs of slow queries used > for preemption: > {noformat} > I0125 17:47:36.970 THREAD23 > org.apache.aurora.scheduler.storage.mem.MemTaskStore.fetchTasks: Query took > 107 ms: TaskQuery(owner:null, environment:null, jobName:null, > taskIds:null, statuses:[KILLING, ASSIGNED, STARTING, RUNNING, RESTARTING], > slaveHost:null, instanceIds:null) > {noformat} > Several approaches come to mind to improve this situation (not mutually > exclusive): > - (easy) More aggressively back off on tasks that cannot be satisfied > - (easy) Fall back to preemption less frequently > - (easy) Gather the list of slaves from {{AttributeStore}} rather than > {{TaskStore}}. This breaks the operation up into many smaller queries and > reduces the amount of work in cases where a match is found. However, this > would actually create more work when a match is not found, so this approach > is probably not helpful by itself. > - (harder) Scan for preemption candidates asynchronously, freeing up the > TaskScheduler thread and the storage write lock. Scans could be kicked off > by the task scheduler, ideally in a way that doesn't dogpile. This could > also be done in a weakly-consistent way to minimally contribute to storage > contention. -- This message was sent by Atlassian JIRA (v6.1.5#6160)