Hi,
Have some more nits below On 18/12/20 10:32, Peter Zijlstra wrote: > Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org> > --- > Documentation/scheduler/schedutil.txt | 168 > ++++++++++++++++++++++++++++++++++ > 1 file changed, 168 insertions(+) > > --- /dev/null > +++ b/Documentation/scheduler/schedutil.txt [...] > +Frequency- / CPU Invariance > +--------------------------- > + > +Because consuming the CPU for 50% at 1GHz is not the same as consuming the > CPU > +for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% > on > +a big CPU, we allow architectures to scale the time delta with two ratios, > one > +Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio. > + > +For simple DVFS architectures (where software is in full control) we > trivially > +compute the ratio as: > + > + f_cur > + r_dvfs := ----- > + f_max > + > +For more dynamic systems where the hardware is in control of DVFS (Intel, > +ARMv8.4-AMU) we use hardware counters to provide us this ratio. For Intel Nit: To me this reads as if the presence of AMUs entail 'hardware is in control of DVFS', which doesn't seem right. How about: For more dynamic systems where the hardware is in control of DVFS we use hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this ratio. > +Schedutil / DVFS > +---------------- > + > +Every time the scheduler load tracking is updated (task wakeup, task > +migration, time progression) we call out to schedutil to update the hardware > +DVFS state. > + > +The basis is the CPU runqueue's 'running' metric, which per the above it is > +the frequency invariant utilization estimate of the CPU. From this we compute > +a desired frequency like: > + > + max( running, util_est ); if UTIL_EST > + u_cfs := { running; otherwise > + > + u_clamp := clamp( u_cfs, u_min, u_max ) > + > + u := u_cfs + u_rt + u_irq + u_dl; [approx. see source for more detail] > + > + f_des := min( f_max, 1.25 u * f_max ) > + In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the below might just bring more confusion; what do you think? clamp( u_cfs + u_rt, u_min, u_max ); if UCLAMP_TASK u_clamp := { u_cfs + u_rt; otherwise u := u_clamp + u_irq + u_dl; [approx. see source for more detail] (also, does this need a word about runnable rt tasks => goto max?) > +XXX IO-wait; when the update is due to a task wakeup from IO-completion we > +boost 'u' above. > + > +This frequency is then used to select a P-state/OPP or directly munged into a > +CPPC style request to the hardware. > + > +XXX: deadline tasks (Sporadic Task Model) allows us to calculate a hard f_min > +required to satisfy the workload. > + > +Because these callbacks are directly from the scheduler, the DVFS hardware > +interaction should be 'fast' and non-blocking. Schedutil supports > +rate-limiting DVFS requests for when hardware interaction is slow and > +expensive, this reduces effectiveness. > + > +For more information see: kernel/sched/cpufreq_schedutil.c > +