Hi Vincent, 2017-02-06 16:07 GMT+08:00 Vincent Guittot <[email protected]>: > Hi Wanpeng > > On 5 February 2017 at 10:57, Wanpeng Li <[email protected]> wrote: >> From: Wanpeng Li <[email protected]> >> >> The commit: >> c5afb6a87f2 ("sched/fair: Fix nohz.next_balance update") >> >> intends to update nohz.next_balance in two steps. >> >> 1) The ILB CPU utilizes next_balance variable in nohz_idle_balance() >> to gather the shortest next balance of other idle CPUs before >> updating nohz.next_balance. >> 2) The ILB CPU updates the nohz.next_balance according to its own >> next_balance after load balance on behalf of other idle CPUs. >> >> However, there is a mess which breaks the original intention of the > > Have you got details of the mess that this generates ? > >> first step, every idle CPUs update nohz.next_balance during ILB CPU >> on behalf of them to do load balance, and then the ILB CPU utilizes >> next_balance variable in nohz_idle_balance() to gather the shortest >> next balance of other idle CPUs before updating nohz.next_balance. >> >> This patch fixes it by don't update nohz.next_balance for other idle >> CPUs when ILB CPU on behalf of them to do load balance. > > But how do you take into account the next balance of other idle CPUs ?
The step 1) which I describe above for your original commit takes it into account. In addition, please refers to the comments which you added(rebalance_domains()) in the original commit: /* * If this CPU has been elected to perform the nohz idle * balance. Other idle CPUs have already rebalanced with * nohz_idle_balance() and nohz.next_balance has been * updated accordingly. This CPU is now running the idle load * balance for itself and we need to update the * nohz.next_balance accordingly. */ > >> >> Cc: Vincent Guittot <[email protected]> >> Cc: Peter Zijlstra (Intel) <[email protected]> >> CC: Ingo Molnar <[email protected]> >> Signed-off-by: Wanpeng Li <[email protected]> >> --- >> kernel/sched/fair.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c >> index 274c747..83948a4 100644 >> --- a/kernel/sched/fair.c >> +++ b/kernel/sched/fair.c >> @@ -8750,7 +8750,8 @@ static void rebalance_domains(struct rq *rq, enum >> cpu_idle_type idle) >> * balance for itself and we need to update the >> * nohz.next_balance accordingly. >> */ >> - if ((idle == CPU_IDLE) && time_after(nohz.next_balance, >> rq->next_balance)) >> + if ((idle == CPU_IDLE) && time_after(nohz.next_balance, >> rq->next_balance) && >> + !test_bit(NOHZ_BALANCE_KICK, >> nohz_flags(this_rq()->cpu))) >> nohz.next_balance = rq->next_balance; > > With this change only the ILB CPU will update the nohz.next_balance > but what about the next_balance of other idle CPUs ? > The nohz.next_balance must be the next_balance of all idle CPU not only the > ILB. > So an idle CPU (other than the ILB) will have to wait for the ILB > CPU's period evcen if it has shorter load balance period Ditto. Regards, Wanpeng Li

