On Wed, Feb 04, 2015 at 06:31:11PM +0000, Morten Rasmussen wrote: > Make wake-ups of new tasks (find_idlest_group) aware of any differences > in cpu compute capacity so new tasks don't get handed off to a cpus with > lower capacity.
That says what; but fails to explain why we want to do this. Remember Changelogs should answer what+why and if complicated also reason about how. > @@ -4971,6 +4972,7 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > > /* Tally up the load of all CPUs in the group */ > avg_load = 0; > + cpu_capacity = 0; > > for_each_cpu(i, sched_group_cpus(group)) { > /* Bias balancing toward cpus of our domain */ > @@ -4980,6 +4982,9 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > load = target_load(i, load_idx); > > avg_load += load; > + > + if (cpu_capacity < capacity_of(i)) > + cpu_capacity = capacity_of(i); > } > > /* Adjust by relative CPU capacity of the group */ So basically you're constructing the max_cpu_capacity for that group here. Might it be clearer to explicitly name/write it as such? max_cpu_capacity = max(max_cpu_capacity, capacity_of(i)); > @@ -4987,14 +4992,20 @@ find_idlest_group(struct sched_domain *sd, struct > task_struct *p, > > if (local_group) { > this_load = avg_load; > + this_cpu_cap = cpu_capacity; > } else if (avg_load < min_load) { > min_load = avg_load; > idlest = group; > + idlest_cpu_cap = cpu_capacity; > } > } while (group = group->next, group != sd->groups); > > - if (!idlest || 100*this_load < imbalance*min_load) > + if (!idlest) > + return NULL; > + > + if (100*this_load < imbalance*min_load && this_cpu_cap >= > idlest_cpu_cap) > return NULL; And here you then fail to provide an idlest group if the selected group has less (max) capacity than the current group. /me goes double check wth capacity_of() returns again, yes, this seems dubious. Suppose we have our two symmetric clusters and for some reason we've routed all our interrupts to one cluster and every cpu regularly receives interrupts. This means that the capacity_of() of this IRQ cluster is always less than the other. The above modification will result in tasks always being placed on the other cluster, even though it might be very busy indeed. If you want to do something like this; one should really add in a current usage metric or so. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/