Hi, Unsurprisingly, the scheduler revamp that was committed last week has bugs. The main issue is that threads could sleep while there are still tasks to run, there are also fairness issues between the global runqueue and the local one. The attached patches should fix that.
Regards, Olivier
>From f47ca20747c1cfc7b9e6413afe9c8819a84e485a Mon Sep 17 00:00:00 2001 From: Olivier Houchard <[email protected]> Date: Mon, 28 May 2018 13:51:06 +0200 Subject: [PATCH 1/3] BUG/MEDIUM: tasks: Don't forget to increase/decrease tasks_run_queue. Don't forget to increase tasks_run_queue when we're adding a task to the tasklet list, and to decrease it when we remove a task from a runqueue, or its value won't be accurate, and could lead to tasks not being executed when put in the global run queue. --- include/proto/task.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/proto/task.h b/include/proto/task.h index c8004143b..ef296f6e9 100644 --- a/include/proto/task.h +++ b/include/proto/task.h @@ -177,6 +177,7 @@ static inline struct task *task_unlink_wq(struct task *t) */ static inline struct task *__task_unlink_rq(struct task *t) { + HA_ATOMIC_SUB(&tasks_run_queue, 1); eb32sc_delete(&t->rq); if (likely(t->nice)) HA_ATOMIC_SUB(&niced_tasks, 1); @@ -219,6 +220,7 @@ static inline void task_insert_into_tasklet_list(struct task *t) */ if (unlikely(!HA_ATOMIC_CAS(&t->rq.node.leaf_p, &expected, 0x1))) return; + HA_ATOMIC_ADD(&tasks_run_queue, 1); task_list_size[tid]++; tl = (struct tasklet *)t; LIST_ADDQ(&task_list[tid], &tl->list); -- 2.14.3
>From 4f28192f2cf16920fbe69e4077a8b948c3bd8f4b Mon Sep 17 00:00:00 2001 From: Olivier Houchard <[email protected]> Date: Mon, 28 May 2018 14:53:08 +0200 Subject: [PATCH 2/3] MINOR: task: Also consider the task list size when getting global tasks. We're taking tasks from the global runqueue based on the number of tasks the thread already have in its local runqueue, but now that we have a task list, we also have to take that into account. --- src/task.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/task.c b/src/task.c index 327518896..f6bf9156d 100644 --- a/src/task.c +++ b/src/task.c @@ -260,7 +260,7 @@ void process_runnable_tasks() * much elements from the global list as to have a bigger local queue * than the average. */ - while (rqueue_size[tid] <= average) { + while ((task_list_size[tid] + rqueue_size[tid]) <= average) { /* we have to restart looking up after every batch */ rq_next = eb32sc_lookup_ge(&rqueue, rqueue_ticks - TIMER_LOOK_BACK, tid_bit); -- 2.14.3
>From beff1e28af2c1b7529dd94586f881ac81a04c797 Mon Sep 17 00:00:00 2001 From: Olivier Houchard <[email protected]> Date: Mon, 28 May 2018 14:54:49 +0200 Subject: [PATCH 3/3] BUG/MEDIUM: task: Don't forget to decrement max_processed after each task. When the task list was introduced, we bogusly lost max_processed--, that means we would execute as much tasks as present in the list, and we would never set active_tasks_mask, so the thread would go to sleep even if more tasks were to be executed. --- src/task.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/task.c b/src/task.c index f6bf9156d..fb4840738 100644 --- a/src/task.c +++ b/src/task.c @@ -367,6 +367,7 @@ void process_runnable_tasks() task_queue(t); } + max_processed--; if (max_processed <= 0) { active_tasks_mask |= tid_bit; activity[tid].long_rq++; -- 2.14.3

