migration: Move calc_load_migrate() into CPU_DYING

Vaidyanathan Srinivasan Tue, 12 Jul 2016 11:50:42 -0700

* Thomas Gleixner <[email protected]> [2016-07-12 18:33:56]:

> Anton,
> 
> On Tue, 12 Jul 2016, Anton Blanchard wrote:
> > > It really does not matter when we fold the load for the outgoing cpu.
> > > It's almost dead anyway, so there is no harm if we fail to fold the
> > > few microseconds which are required for going fully away.
> > 
> > We are seeing the load average shoot up when hot unplugging CPUs (+1
> > for every CPU we offline) on ppc64. This reproduces on bare metal as
> > well as inside a KVM guest. A bisect points at this commit.
> > 
> > As an example, a completely idle box with 128 CPUS and 112 hot
> > unplugged:
> > 
> > # uptime
> >  04:35:30 up  1:23,  2 users,  load average: 112.43, 122.94, 125.54
> 
> Yes, it's an off by one as we now call that from the task which is tearing
> down the cpu. Does the patch below fix it?


Hi Thomas,

Yes this patch fixes the issue.  I was able to recreate the problem
and also verify with this patch on 4.7.0-rc7.

> 
> Thanks,
> 
>       tglx
> 
> 8<----------------------
> 
> Subject: sched/migration: Correct off by one in load migration
> From: Thomas Gleixner <[email protected]>
> 
> The move of calc_load_migrate() from CPU_DEAD to CPU_DYING did not take into
> account that the function is now called from a thread running on the outgoing
> CPU. As a result a cpu unplug leakes a load of 1 into the global load
> accounting mechanism.
> 
> Fix it by adjusting for the currently running thread which calls
> calc_load_migrate().
> 
> Fixes: e9cd8fa4fcfd: "sched/migration: Move calc_load_migrate() into 
> CPU_DYING"
> Reported-by: Anton Blanchard <[email protected]>

Tested-by: Vaidyanathan Srinivasan <[email protected]>

> Signed-off-by: Thomas Gleixner <[email protected]>
>
> ---
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 51d7105f529a..97ee9ac7e97c 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -5394,13 +5394,15 @@ void idle_task_exit(void)
>  /*
>   * Since this CPU is going 'away' for a while, fold any nr_active delta
>   * we might have. Assumes we're called after migrate_tasks() so that the
> - * nr_active count is stable.
> + * nr_active count is stable. We need to take the teardown thread which
> + * is calling this into account, so we hand in adjust = 1 to the load
> + * calculation.
>   *
>   * Also see the comment "Global load-average calculations".
>   */
>  static void calc_load_migrate(struct rq *rq)
>  {
> -     long delta = calc_load_fold_active(rq);
> +     long delta = calc_load_fold_active(rq, 1);
>       if (delta)
>               atomic_long_add(delta, &calc_load_tasks);
>  }
> diff --git a/kernel/sched/loadavg.c b/kernel/sched/loadavg.c
> index b0b93fd33af9..a2d6eb71f06b 100644
> --- a/kernel/sched/loadavg.c
> +++ b/kernel/sched/loadavg.c
> @@ -78,11 +78,11 @@ void get_avenrun(unsigned long *loads, unsigned long 
> offset, int shift)
>       loads[2] = (avenrun[2] + offset) << shift;
>  }
> 
> -long calc_load_fold_active(struct rq *this_rq)
> +long calc_load_fold_active(struct rq *this_rq, long adjust)
>  {
>       long nr_active, delta = 0;
> 
> -     nr_active = this_rq->nr_running;
> +     nr_active = this_rq->nr_running - adjust;
>       nr_active += (long)this_rq->nr_uninterruptible;

        if (nr_active != this_rq->calc_load_active) {
                delta = nr_active - this_rq->calc_load_active;
                this_rq->calc_load_active = nr_active;
        }

        return delta;

Does the above calculation hold good even if we send adjust=1 and bump
down nr_active? Tested ok though :)

--Vaidy

Re: [patch 10/15] sched/migration: Move calc_load_migrate() into CPU_DYING

Reply via email to