* Thomas Gleixner <[email protected]> [2016-07-12 18:33:56]: > Anton, > > On Tue, 12 Jul 2016, Anton Blanchard wrote: > > > It really does not matter when we fold the load for the outgoing cpu. > > > It's almost dead anyway, so there is no harm if we fail to fold the > > > few microseconds which are required for going fully away. > > > > We are seeing the load average shoot up when hot unplugging CPUs (+1 > > for every CPU we offline) on ppc64. This reproduces on bare metal as > > well as inside a KVM guest. A bisect points at this commit. > > > > As an example, a completely idle box with 128 CPUS and 112 hot > > unplugged: > > > > # uptime > > 04:35:30 up 1:23, 2 users, load average: 112.43, 122.94, 125.54 > > Yes, it's an off by one as we now call that from the task which is tearing > down the cpu. Does the patch below fix it?
Hi Thomas, Yes this patch fixes the issue. I was able to recreate the problem and also verify with this patch on 4.7.0-rc7. > > Thanks, > > tglx > > 8<---------------------- > > Subject: sched/migration: Correct off by one in load migration > From: Thomas Gleixner <[email protected]> > > The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into > account that the function is now called from a thread running on the outgoing > CPU. As a result a cpu unplug leakes a load of 1 into the global load > accounting mechanism. > > Fix it by adjusting for the currently running thread which calls > calc_load_migrate(). > > Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into > CPU_DYING" > Reported-by: Anton Blanchard <[email protected]> Tested-by: Vaidyanathan Srinivasan <[email protected]> > Signed-off-by: Thomas Gleixner <[email protected]> > > --- > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 51d7105f529a..97ee9ac7e97c 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -5394,13 +5394,15 @@ void idle_task_exit(void) > /* > * Since this CPU is going 'away' for a while, fold any nr_active delta > * we might have. Assumes we're called after migrate_tasks() so that the > - * nr_active count is stable. > + * nr_active count is stable. We need to take the teardown thread which > + * is calling this into account, so we hand in adjust = 1 to the load > + * calculation. > * > * Also see the comment "Global load-average calculations". > */ > static void calc_load_migrate(struct rq *rq) > { > - long delta = calc_load_fold_active(rq); > + long delta = calc_load_fold_active(rq, 1); > if (delta) > atomic_long_add(delta, &calc_load_tasks); > } > diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c > index b0b93fd33af9..a2d6eb71f06b 100644 > --- a/kernel/sched/loadavg.c > +++ b/kernel/sched/loadavg.c > @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long > offset, int shift) > loads[2] = (avenrun[2] + offset) << shift; > } > > -long calc_load_fold_active(struct rq *this_rq) > +long calc_load_fold_active(struct rq *this_rq, long adjust) > { > long nr_active, delta = 0; > > - nr_active = this_rq->nr_running; > + nr_active = this_rq->nr_running - adjust; > nr_active += (long)this_rq->nr_uninterruptible; if (nr_active != this_rq->calc_load_active) { delta = nr_active - this_rq->calc_load_active; this_rq->calc_load_active = nr_active; } return delta; Does the above calculation hold good even if we send adjust=1 and bump down nr_active? Tested ok though :) --Vaidy

