On 3 May 2017 at 15:30, Juri Lelli <[email protected]> wrote: > Currently, sugov_next_freq_shared() uses last_freq_update_time as a > reference to decide when to start considering CPU contributions as > stale. > > However, since last_freq_update_time is set by the last CPU that issued > a frequency transition, this might cause problems in certain cases. In > practice, the detection of stale utilization values fails whenever the > CPU with such values was the last to update the policy. For example (and > please note again that the SCHED_CPUFREQ_RT flag is not the problem > here, but only the detection of after how much time that flag has to be > considered stale), suppose a policy with 2 CPUs: > > CPU0 | CPU1 > | > | RT task scheduled > | SCHED_CPUFREQ_RT is set > | CPU1->last_update = now > | freq transition to max > | last_freq_update_time = now > | > > more than TICK_NSEC nsecs > > | > a small CFS wakes up | > CPU0->last_update = now1 | > delta_ns(CPU0) < TICK_NSEC* | > CPU0's util is considered | > delta_ns(CPU1) = | > last_freq_update_time - | > CPU1->last_update = 0 | > < TICK_NSEC | > CPU1 is still considered | > CPU1->SCHED_CPUFREQ_RT is set | > we stay at max (until CPU1 | > exits from idle) | > > * delta_ns is actually negative as now1 > last_freq_update_time > > While last_freq_update_time is a sensible reference for rate limiting, > it doesn't seem to be useful for working around stale CPU states. > > Fix the problem by always considering now (time) as the reference for > deciding when CPUs have stale contributions. > > Signed-off-by: Juri Lelli <[email protected]> > Cc: Rafael J. Wysocki <[email protected]> > Cc: Viresh Kumar <[email protected]>
FWIW Acked-by: Vincent Guittot <[email protected]>

