On Fri, Apr 29, 2016 at 08:32:40PM +0100, Dietmar Eggemann wrote: > From: Morten Rasmussen <morten.rasmus...@arm.com> > > In calculate_imbalance() load_above_capacity currently has the unit > [load] while it is used as being [load/capacity]. Not only is it wrong it > also makes it unlikely that load_above_capacity is ever used as the > subsequent code picks the smaller of load_above_capacity and the avg_load > difference between the local and the busiest sched_groups. > > This patch ensures that load_above_capacity has the right unit > [load/capacity]. > > Signed-off-by: Morten Rasmussen <morten.rasmus...@arm.com> > --- > kernel/sched/fair.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index bc19fee94f39..c066574cff04 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7016,9 +7016,11 @@ static inline void calculate_imbalance(struct lb_env > *env, struct sd_lb_stats *s > local->group_type == group_overloaded) { > load_above_capacity = busiest->sum_nr_running * > SCHED_LOAD_SCALE;
Given smpnice and cgroups; this starting point seems somewhat random. Then again, without cgroup and assuming all tasks are nice-0 it is still true. But I think the cgroup muck is rather universal these days. The above SCHED_LOAD_SCALE is the 1 -> load metric, right? > - if (load_above_capacity > busiest->group_capacity) > + if (load_above_capacity > busiest->group_capacity) { > load_above_capacity -= busiest->group_capacity; > - else > + load_above_capacity *= SCHED_LOAD_SCALE; > + load_above_capacity /= busiest->group_capacity; And this one does load->capacity > + } else > load_above_capacity = ~0UL; > } Is there anything better we can do? I know there's a fair bit of cruft in; but these assumptions that SCHED_LOAD_SCALE is one task, just bug the hell out of me. Would it make sense to make load_balance()->detach_tasks() ensure busiest->sum_nr_running >= busiest->group_weight or so?