On Mon, Nov 23, 2020 at 09:41:23AM +1100, Balbir Singh wrote: > On Tue, Nov 17, 2020 at 06:19:40PM -0500, Joel Fernandes (Google) wrote: > > From: Peter Zijlstra <[email protected]> > > > > The rationale is as follows. In the core-wide pick logic, even if > > need_sync == false, we need to go look at other CPUs (non-local CPUs) to > > see if they could be running RT. > > > > Say the RQs in a particular core look like this: > > Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task. > > > > rq0 rq1 > > CFS1 (tagged) RT1 (not tag) > > CFS2 (tagged) > > > > Say schedule() runs on rq0. Now, it will enter the above loop and > > pick_task(RT) will return NULL for 'p'. It will enter the above if() block > > and see that need_sync == false and will skip RT entirely. > > > > The end result of the selection will be (say prio(CFS1) > prio(CFS2)): > > rq0 rq1 > > CFS1 IDLE > > > > When it should have selected: > > rq0 r1 > > IDLE RT > > > > Joel saw this issue on real-world usecases in ChromeOS where an RT task > > gets constantly force-idled and breaks RT. Lets cure it. > > > > NOTE: This problem will be fixed differently in a later patch. It just > > kept here for reference purposes about this issue, and to make > > applying later patches easier. > > > > The changelog is hard to read, it refers to above if(), whereas there > is no code snippet in the changelog.
Yeah sorry, it comes from this email where I described the issue: http://lore.kernel.org/r/[email protected] I corrected the changelog and appended the patch below. Also pushed it to: https://git.kernel.org/pub/scm/linux/kernel/git/jfern/linux.git/log/?h=coresched > Also, from what I can see following > the series, p->core_cookie is not yet set anywhere (unless I missed it), > so fixing it in here did not make sense just reading the series. The interface patches for core_cookie are added later, that's how it is. The infrastructure comes first here. It would also not make sense to add interface first as well so I think the current ordering is fine. ---8<----------------------- From: Peter Zijlstra <[email protected]> Subject: [PATCH] sched: Fix priority inversion of cookied task with sibling The rationale is as follows. In the core-wide pick logic, even if need_sync == false, we need to go look at other CPUs (non-local CPUs) to see if they could be running RT. Say the RQs in a particular core look like this: Let CFS1 and CFS2 be 2 tagged CFS tags. Let RT1 be an untagged RT task. rq0 rq1 CFS1 (tagged) RT1 (not tag) CFS2 (tagged) The end result of the selection will be (say prio(CFS1) > prio(CFS2)): rq0 rq1 CFS1 IDLE When it should have selected: rq0 r1 IDLE RT Fix this issue by forcing need_sync and restarting the search if a cookied task was discovered. This will avoid this optimization from making incorrect picks. Joel saw this issue on real-world usecases in ChromeOS where an RT task gets constantly force-idled and breaks RT. Lets cure it. NOTE: This problem will be fixed differently in a later patch. It just kept here for reference purposes about this issue, and to make applying later patches easier. Reported-by: Joel Fernandes (Google) <[email protected]> Signed-off-by: Peter Zijlstra <[email protected]> Signed-off-by: Joel Fernandes (Google) <[email protected]> --- kernel/sched/core.c | 25 ++++++++++++++++--------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4ee4902c2cf5..53af817740c0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5195,6 +5195,7 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) need_sync = !!rq->core->core_cookie; /* reset state */ +reset: rq->core->core_cookie = 0UL; if (rq->core->core_forceidle) { need_sync = true; @@ -5242,14 +5243,8 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) /* * If there weren't no cookies; we don't need to * bother with the other siblings. - * If the rest of the core is not running a tagged - * task, i.e. need_sync == 0, and the current CPU - * which called into the schedule() loop does not - * have any tasks for this class, skip selecting for - * other siblings since there's no point. We don't skip - * for RT/DL because that could make CFS force-idle RT. */ - if (i == cpu && !need_sync && class == &fair_sched_class) + if (i == cpu && !need_sync) goto next_class; continue; @@ -5259,7 +5254,20 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) * Optimize the 'normal' case where there aren't any * cookies and we don't need to sync up. */ - if (i == cpu && !need_sync && !p->core_cookie) { + if (i == cpu && !need_sync) { + if (p->core_cookie) { + /* + * This optimization is only valid as + * long as there are no cookies + * involved. We may have skipped + * non-empty higher priority classes on + * siblings, which are empty on this + * CPU, so start over. + */ + need_sync = true; + goto reset; + } + next = p; goto done; } @@ -5299,7 +5307,6 @@ pick_next_task(struct rq *rq, struct task_struct *prev, struct rq_flags *rf) */ need_sync = true; } - } } next_class:; -- 2.29.2.454.gaff20da3a2-goog

