On 4 July 2017 at 09:27, Peter Zijlstra <[email protected]> wrote: > On Sat, Jul 01, 2017 at 07:06:13AM +0200, Vincent Guittot wrote: >> The running state is a subset of runnable state which means that running >> can't be set if runnable (weight) is cleared. There are corner cases >> where the current sched_entity has been already dequeued but cfs_rq->curr >> has not been updated yet and still points to the dequeued sched_entity. >> If ___update_load_avg is called at that time, weight will be 0 and running >> will be set which is not possible. >> >> This case happens during pick_next_task_fair() when a cfs_rq becomes idles. >> The current sched_entity has been dequeued so se->on_rq is cleared and >> cfs_rq->weight is null. But cfs_rq->curr still points to se (it will be >> cleared when picking the idle thread). Because the cfs_rq becomes idle, >> idle_balance() is called and ends up to call update_blocked_averages() >> with these wrong running and runnable states. >> >> Add a test in ___update_load_avg to correct the running state in this case. > > Cute, however did you find that ?
In fact, while rebasing and running more tests on my patch "update scale invariance of PELT" that changes how to scale the load and utilization, I have seen that sometimes the utilization was increasing but not the load when CPU was going into idle state because the stolen_idle time was applied as idle time for load but running time for utilization. This patch has highlighted the problem.

