Hi Paul, Peter etc. I've found a couple of bugs in the load tracking code and here is my attempt to fix them. I have some test code available which can trigger the issues if anyone is interested.
The first one is straightforward. We can leave a number in se.avg.decay_count after a short sleep. If that task is later migrated while runnable, then the left-over decay looks like unaccounted sleep time so the load is decayed. The second one is similar. Here we are losing sleep time for a task if it is migrated while sleeping and the CPU it previously ran on has entered nohz mode. I don't really like this fix much, but the root of the problem is that load tracking more-or-less expects the runqueue's decay_counter to be up to date, and when nohz is in use it is not. The fix demonstrates the issue anyway, I haven't seen other occasions where nohz CPUs distort the tracked load. Chris Redpath (2): sched: reset blocked load decay_count during synchronization sched: update runqueue clock before migrations away kernel/sched/fair.c | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) -- 1.7.9.5 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

