On Tue, 20 Apr 2021 at 17:20, Rik van Riel <r...@surriel.com> wrote: > > On Tue, 2021-04-20 at 11:04 +0200, Vincent Guittot wrote: > > On Mon, 19 Apr 2021 at 18:51, Rik van Riel <r...@surriel.com> wrote: > > > > > > @@ -10688,7 +10697,7 @@ static int newidle_balance(struct rq > > > *this_rq, struct rq_flags *rf) > > > if (this_rq->nr_running != this_rq->cfs.h_nr_running) > > > pulled_task = -1; > > > > > > - if (pulled_task) > > > + if (pulled_task || this_rq->ttwu_pending) > > > > This needs at least a comment to explain why we must clear > > this_rq->idle_stamp when this_rq->ttwu_pending is set whereas it is > > also done during sched_ttwu_pending() > > > > > this_rq->idle_stamp = 0; > > I spent some time staring at sched_ttwu_pending and > the functions it calls, but I can't seem to spot > where it clears rq->idle_stamp, except inside > ttwu_do_wakeup where it will end up adding a > non-idle period into the rq->avg_idle, which seems > wrong.
Not sure that this is really wrong because it ends up scheduling the idle task which is immediately preempted. But the preemption happened in the idle task, isn't it ? > > If we are actually idle, and get woken up with a > ttwu_queue task, we do not come through newidle_balance, > and we end up counting the idle time into the avg_idle > number. > > However, if a task is woken up while the CPU is > in newidle_balance, because prev != idle, we should > not count that period towards rq->avg_idle, for > the same reason we do so when we pulled a task. As mentioned above, we have effectively schedule the idle task in your case whereas we don't in the other cases IIUC, your problem comes from rq->avg_idle decreasing a lot in such cases. And because rq->avg_idle is used to decide if you have time to run newlyidle_balance,you skip it more often. > > I'll add a comment in v3 explaining why idle_stamp > needs to be 0. Yes please. > > -- > All Rights Reversed.