Re: [PATCH] sched: Add schedutil overview

Valentin Schneider Fri, 18 Dec 2020 03:34:34 -0800

Hi,


Have some more nits below

On 18/12/20 10:32, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) <pet...@infradead.org>
> ---
>  Documentation/scheduler/schedutil.txt |  168 
> ++++++++++++++++++++++++++++++++++
>  1 file changed, 168 insertions(+)
>
> --- /dev/null
> +++ b/Documentation/scheduler/schedutil.txt
[...]
> +Frequency- / CPU Invariance
> +---------------------------
> +
> +Because consuming the CPU for 50% at 1GHz is not the same as consuming the 
> CPU
> +for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% 
> on
> +a big CPU, we allow architectures to scale the time delta with two ratios, 
> one
> +Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
> +
> +For simple DVFS architectures (where software is in full control) we 
> trivially
> +compute the ratio as:
> +
> +         f_cur
> +  r_dvfs := -----
> +            f_max
> +
> +For more dynamic systems where the hardware is in control of DVFS (Intel,
> +ARMv8.4-AMU) we use hardware counters to provide us this ratio. For Intel

Nit: To me this reads as if the presence of AMUs entail 'hardware is in
control of DVFS', which doesn't seem right. How about:

  For more dynamic systems where the hardware is in control of DVFS we use
  hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this
  ratio.

> +Schedutil / DVFS
> +----------------
> +
> +Every time the scheduler load tracking is updated (task wakeup, task
> +migration, time progression) we call out to schedutil to update the hardware
> +DVFS state.
> +
> +The basis is the CPU runqueue's 'running' metric, which per the above it is
> +the frequency invariant utilization estimate of the CPU. From this we compute
> +a desired frequency like:
> +
> +             max( running, util_est );       if UTIL_EST
> +  u_cfs := { running;                        otherwise
> +
> +  u_clamp := clamp( u_cfs, u_min, u_max )
> +
> +  u := u_cfs + u_rt + u_irq + u_dl;  [approx. see source for more detail]
> +
> +  f_des := min( f_max, 1.25 u * f_max )
> +

In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the
below might just bring more confusion; what do you think?

               clamp( u_cfs + u_rt, u_min, u_max );      if UCLAMP_TASK
  u_clamp := { u_cfs + u_rt;                             otherwise

  u := u_clamp + u_irq + u_dl;      [approx. see source for more detail]

(also, does this need a word about runnable rt tasks => goto max?)

> +XXX IO-wait; when the update is due to a task wakeup from IO-completion we
> +boost 'u' above.
> +
> +This frequency is then used to select a P-state/OPP or directly munged into a
> +CPPC style request to the hardware.
> +
> +XXX: deadline tasks (Sporadic Task Model) allows us to calculate a hard f_min
> +required to satisfy the workload.
> +
> +Because these callbacks are directly from the scheduler, the DVFS hardware
> +interaction should be 'fast' and non-blocking. Schedutil supports
> +rate-limiting DVFS requests for when hardware interaction is slow and
> +expensive, this reduces effectiveness.
> +
> +For more information see: kernel/sched/cpufreq_schedutil.c
> +

Re: [PATCH] sched: Add schedutil overview

Reply via email to