fair: rework load_balance

Vincent Guittot Wed, 31 Jul 2019 08:38:01 -0700

On Wed, 31 Jul 2019 at 15:44, Srikar Dronamraju
<[email protected]> wrote:
>
> * Vincent Guittot <[email protected]> [2019-07-26 16:42:53]:
>
> > On Fri, 26 Jul 2019 at 15:59, Srikar Dronamraju
> > <[email protected]> wrote:
> > > > @@ -7361,19 +7357,46 @@ static int detach_tasks(struct lb_env *env)
> > > >               if (!can_migrate_task(p, env))
> > > >                       goto next;
> > > >
> > > > -             load = task_h_load(p);
> > > > +             if (env->src_grp_type == migrate_load) {
> > > > +                     unsigned long load = task_h_load(p);
> > > >
> > > > -             if (sched_feat(LB_MIN) && load < 16 && 
> > > > !env->sd->nr_balance_failed)
> > > > -                     goto next;
> > > > +                     if (sched_feat(LB_MIN) &&
> > > > +                         load < 16 && !env->sd->nr_balance_failed)
> > > > +                             goto next;
> > > > +
> > > > +                     if ((load / 2) > env->imbalance)
> > > > +                             goto next;
> > >
> > > I know this existed before too but if the load is exactly or around 2x of
> > > env->imbalance, the resultant imbalance after the load balance operation
> > > would still be around env->imbalance. We may lose some cache affinity too.
> > >
> > > Can we do something like.
> > >                 if (2 * load > 3 * env->imbalance)
> > >                         goto next;
> >
> > TBH, I don't know what should be the best value and it's probably
> > worth doing some investigation but i would prefer to do that as a
> > separate patch to get a similar behavior in the overloaded case
> > Why do you propose 3/2 instead of 2 ?
> >
>
> If the imbalance is exactly or around load/2, then we still select the task 
> to migrate
> However after the migrate the imbalance will still be load/2.
> - Can this lead to ping/pong?


In some case you're probably right but this might also help to move
other tasks when  3/2 will not.

1st example with 3 tasks with almost same load on 2 cpus, we will
probably end up migrating the waiting task with a load  / 2 ~<
env->imbalance whereas 2 * load > 3 * env->imbalance will not move any
task.
Note that we don't ensure fairness between the 3 tasks in the latter
case. TBH I don't know what is better

2nd example with 2 tasks TA and TB  with a load around L and 4 tasks
T0 T1 T2 T3 with a load around L/4
TA and TB are on CPU0 and T0 to T3 are on CPU1, we have the same
imbalance as previous example 2L on CPU0 and L on CPU1.
With load / 2 > env->imbalance, we will migrate TA or TB to CPU1 and
then, we might end up migrating 2 tasks among T0 to T3 and the system
will be balanced
With 2 * load > 3 * env->imbalance, we will not migrate TA or TB
because task load is higher the the imbalance which is L/2 and the
scheduler will never get a chance to balance the system whereas it
could have reached a balanced state

Just to say that I'm not sure that there one best ratio to use

> - Did we lose out of cache though we didn't gain from an imbalance.
>
> > > >
> > > >
> > > > -     if (sgs->sum_h_nr_running)
> > > > -             sgs->load_per_task = sgs->group_load / 
> > > > sgs->sum_h_nr_running;
> > > > +     sgs->group_capacity = group->sgc->capacity;
> > > >
> > > >       sgs->group_weight = group->group_weight;
> > > >
> > > > -     sgs->group_no_capacity = group_is_overloaded(env, sgs);
> > > > -     sgs->group_type = group_classify(group, sgs);
> > > > +     sgs->group_type = group_classify(env, group, sgs);
> > > > +
> > > > +     /* Computing avg_load makes sense only when group is overloaded */
> > > > +     if (sgs->group_type != group_overloaded)
> > > > +             sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) /
> > > > +                             sgs->group_capacity;
> > >
> > > Mismatch in comment and code?
> >
> > I may need to add more comments but at this step, the group should be
> > either overloaded or fully busy but it can also be imbalanced.
> > In case of a group fully busy or imbalanced (sgs->group_type !=
> > group_overloaded), we haven't computed avg_load yet so we have to do
> > so because:
> > -In the case of fully_busy, we are going to be overloaded which the
> > next step after fully busy when you are about to pull more load
> > -In case of imbalance, we don't know the real state of the local group
> > so we fall back to this default behavior
> >
>
> We seem to be checking for avg_load when the group_type is group_overloaded.
> But somehow I am don't see where sgs->avg_load is calculated for
> group_overloaded case.

My fault, I read to quickly your comment and thought that you were
referring to calculte_imbalance() but your comment is about
update_sg_lb_stats().
As Peter mentioned previously this should be
  if (sgs->group_type == group_overloaded)
instead of
  if (sgs->group_type != group_overloaded)

>
> > >
> > > We calculated avg_load for !group_overloaded case, but seem to be using 
> > > for
> > > group_overloaded cases too.
> >
> > for group_overloaded case, we already computed it in update_sg_lb_stats()
> >
>
>
> From update_sg_lb_stats()
>
>         if (sgs->group_type != group_overloaded)
>                 sgs->avg_load = (sgs->group_load*SCHED_CAPACITY_SCALE) /
>                                 sgs->group_capacity;
>
> So we seem to be skipping calculation of avg_load for group_overloaded. No?

yes.

Thanks
Vincent

>
>
> --
> Thanks and Regards
> Srikar Dronamraju
>

Re: [PATCH 3/5] sched/fair: rework load_balance

Reply via email to