On Thu, May 21, 2020 at 07:14:26PM -0400, Joel Fernandes wrote: > On Wed, Mar 04, 2020 at 04:59:57PM +0000, vpillai wrote: > > From: Peter Zijlstra <pet...@infradead.org> > > > > Instead of only selecting a local task, select a task for all SMT > > siblings for every reschedule on the core (irrespective which logical > > CPU does the reschedule). > > > > There could be races in core scheduler where a CPU is trying to pick > > a task for its sibling in core scheduler, when that CPU has just been > > offlined. We should not schedule any tasks on the CPU in this case. > > Return an idle task in pick_next_task for this situation. > > > > NOTE: there is still potential for siblings rivalry. > > NOTE: this is far too complicated; but thus far I've failed to > > simplify it further. > > > > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > > Signed-off-by: Julien Desfossez <jdesfos...@digitalocean.com> > > Signed-off-by: Vineeth Remanan Pillai <vpil...@digitalocean.com> > > Signed-off-by: Aaron Lu <aaron...@linux.alibaba.com> > > Signed-off-by: Tim Chen <tim.c.c...@linux.intel.com> > > --- > > kernel/sched/core.c | 274 ++++++++++++++++++++++++++++++++++++++++++- > > kernel/sched/fair.c | 40 +++++++ > > kernel/sched/sched.h | 6 +- > > 3 files changed, 318 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > index 445f0d519336..9a1bd236044e 100644 > > --- a/kernel/sched/core.c > > +++ b/kernel/sched/core.c > > @@ -4253,7 +4253,7 @@ static inline void schedule_debug(struct task_struct > > *prev, bool preempt) > > * Pick up the highest-prio task: > > */ > > static inline struct task_struct * > > -pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags > > *rf) > > +__pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags > > *rf) > > { > > const struct sched_class *class; > > struct task_struct *p; > > @@ -4309,6 +4309,273 @@ pick_next_task(struct rq *rq, struct task_struct > > *prev, struct rq_flags *rf) > > BUG(); > > } > > > > +#ifdef CONFIG_SCHED_CORE > > + > > +static inline bool cookie_equals(struct task_struct *a, unsigned long > > cookie) > > +{ > > + return is_idle_task(a) || (a->core_cookie == cookie); > > +} > > + > > +static inline bool cookie_match(struct task_struct *a, struct task_struct > > *b) > > +{ > > + if (is_idle_task(a) || is_idle_task(b)) > > + return true; > > + > > + return a->core_cookie == b->core_cookie; > > +} > > + > > +// XXX fairness/fwd progress conditions > > +/* > > + * Returns > > + * - NULL if there is no runnable task for this class. > > + * - the highest priority task for this runqueue if it matches > > + * rq->core->core_cookie or its priority is greater than max. > > + * - Else returns idle_task. > > + */ > > +static struct task_struct * > > +pick_task(struct rq *rq, const struct sched_class *class, struct > > task_struct *max) > > +{ > > + struct task_struct *class_pick, *cookie_pick; > > + unsigned long cookie = rq->core->core_cookie; > > + > > + class_pick = class->pick_task(rq); > > + if (!class_pick) > > + return NULL; > > + > > + if (!cookie) { > > + /* > > + * If class_pick is tagged, return it only if it has > > + * higher priority than max. > > + */ > > + if (max && class_pick->core_cookie && > > + prio_less(class_pick, max)) > > + return idle_sched_class.pick_task(rq); > > + > > + return class_pick; > > + } > > + > > + /* > > + * If class_pick is idle or matches cookie, return early. > > + */ > > + if (cookie_equals(class_pick, cookie)) > > + return class_pick; > > + > > + cookie_pick = sched_core_find(rq, cookie); > > + > > + /* > > + * If class > max && class > cookie, it is the highest priority task on > > + * the core (so far) and it must be selected, otherwise we must go with > > + * the cookie pick in order to satisfy the constraint. > > + */ > > + if (prio_less(cookie_pick, class_pick) && > > + (!max || prio_less(max, class_pick))) > > + return class_pick; > > + > > + return cookie_pick; > > +} > > I've been hating on this pick_task() routine for a while now :-). If we add > the task to the tag tree as Peter suggested at OSPM for that other issue > Vineeth found, it seems it could be simpler. > > This has just been near a compiler so far but how about:
Discussed a lot with Vineeth. Below is an improved version of the pick_task() similification. It also handles the following "bug" in the existing code as well that Vineeth brought up in OSPM: Suppose 2 siblings of a core: rq 1 and rq 2. In priority order (high to low), say we have the tasks: A - untagged (rq 1) B - tagged (rq 2) C - untagged (rq 2) Say, B and C are in the same scheduling class. When the pick_next_task() loop runs, it looks at rq 1 and max is A, A is tenantively selected for rq 1. Then it looks at rq 2 and the class_pick is B. But that's not compatible with A. So rq 2 gets forced idle. In reality, rq 2 could have run C instead of idle. The fix is to add C to the tag tree as Peter suggested in OSPM. Updated diff below: ---8<----------------------- diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 005d7f7323e2d..625377f393ed3 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -182,9 +182,6 @@ static void sched_core_enqueue(struct rq *rq, struct task_struct *p) rq->core->core_task_seq++; - if (!p->core_cookie) - return; - node = &rq->core_tree.rb_node; parent = *node; @@ -215,7 +212,7 @@ static void sched_core_dequeue(struct rq *rq, struct task_struct *p) void sched_core_add(struct rq *rq, struct task_struct *p) { - if (p->core_cookie && task_on_rq_queued(p)) + if (task_on_rq_queued(p)) sched_core_enqueue(rq, p); } @@ -4556,43 +4553,57 @@ void sched_core_irq_exit(void) static struct task_struct * pick_task(struct rq *rq, const struct sched_class *class, struct task_struct *max) { - struct task_struct *class_pick, *cookie_pick; + struct task_struct *class_pick, *cookie_pick, *rq_pick; unsigned long cookie = rq->core->core_cookie; class_pick = class->pick_task(rq); if (!class_pick) return NULL; - if (!cookie) { - /* - * If class_pick is tagged, return it only if it has - * higher priority than max. - */ - if (max && class_pick->core_cookie && - prio_less(class_pick, max)) - return idle_sched_class.pick_task(rq); + if (!max) + return class_pick; + + /* Make sure the current max's cookie is core->core_cookie */ + WARN_ON_ONCE(max->core_cookie != cookie); + /* Try to play really nice: see if the class's cookie works. */ + if (cookie_equals(class_pick, cookie)) return class_pick; - } /* - * If class_pick is idle or matches cookie, return early. + * From here on, we must return class_pick, cookie_pick or idle. + * Following are the cases: + * 1 - lowest prio. + * 3 - highest prio. + * + * max class cookie outcome + * 1 2 3 cookie + * 1 3 2 class + * 2 1 3 cookie + * 2 3 1 class + * 3 1 2 cookie + * 3 2 1 cookie + * 3 2 - return idle (when no cookie task). */ - if (cookie_equals(class_pick, cookie)) - return class_pick; + /* First try to find the highest prio of (cookie, class and max). */ cookie_pick = sched_core_find(rq, cookie); + if (cookie_pick && prio_less(class_pick, cookie_pick)) + rq_pick = cookie_pick; + else + rq_pick = class_pick; + if (prio_less(max, rq_pick)) + return rq_pick; + + /* If we max was greatest, then see if there was a cookie. */ + if (cookie_pick) + return cookie_pick; /* - * If class > max && class > cookie, it is the highest priority task on - * the core (so far) and it must be selected, otherwise we must go with - * the cookie pick in order to satisfy the constraint. + * We get here with if class_pick was incompatible with max + * and lower prio than max. So we have nothing. */ - if (prio_less(cookie_pick, class_pick) && - (!max || prio_less(max, class_pick))) - return class_pick; - - return cookie_pick; + return idle_sched_class.pick_task(rq); } static struct task_struct *