Re: [PATCH] sched: Add schedutil overview

2020-12-18 Thread Valentin Schneider


On 18/12/20 13:40, Morten Rasmussen wrote:
> On Fri, Dec 18, 2020 at 11:33:09AM +, Valentin Schneider wrote:
>> (also, does this need a word about runnable rt tasks => goto max?)
>
> What is actually the intended policy there? I thought it was goto max
> unless rt was clamped, but if I read the code correctly in
> schedutil_cpu_util() the current policy is only goto max if uclamp isn't
> in use at all, including cfs.
>

Right, so the policy pretty much is: by default, if there are runnable rt
tasks, goto max freq.

When uclamp isn't used, that's hardcoded.

When uclamp is in use, the default RT uclamp.min is 1024, so it "naturally"
drives frequency selection to the max when there are runnable RT tasks
(rq-aggregated uclamp.min == 1024). That default
(uclamp_util_min_rt_default) can be tweaked.

> The write-up looks good to me.
>
> Reviewed-by: Morten Rasmussen 
>
> Morten


Re: [PATCH] sched: Add schedutil overview

2020-12-18 Thread Morten Rasmussen
On Fri, Dec 18, 2020 at 11:33:09AM +, Valentin Schneider wrote:
> On 18/12/20 10:32, Peter Zijlstra wrote:
> > +Schedutil / DVFS
> > +
> > +
> > +Every time the scheduler load tracking is updated (task wakeup, task
> > +migration, time progression) we call out to schedutil to update the 
> > hardware
> > +DVFS state.
> > +
> > +The basis is the CPU runqueue's 'running' metric, which per the above it is
> > +the frequency invariant utilization estimate of the CPU. From this we 
> > compute
> > +a desired frequency like:
> > +
> > + max( running, util_est ); if UTIL_EST
> > +  u_cfs := { running;  otherwise
> > +
> > +  u_clamp := clamp( u_cfs, u_min, u_max )
> > +
> > +  u := u_cfs + u_rt + u_irq + u_dl;[approx. see source for more 
> > detail]
> > +
> > +  f_des := min( f_max, 1.25 u * f_max )
> > +
> 
> In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the
> below might just bring more confusion; what do you think?
> 
>clamp( u_cfs + u_rt, u_min, u_max );  if UCLAMP_TASK
>   u_clamp := { u_cfs + u_rt; otherwise
> 
>   u := u_clamp + u_irq + u_dl;[approx. see source for more detail]

It is reflecting the code so I think it is worth it. It also fixes the
typo in the original sum (u_cfs -> u_clamp).

> (also, does this need a word about runnable rt tasks => goto max?)

What is actually the intended policy there? I thought it was goto max
unless rt was clamped, but if I read the code correctly in
schedutil_cpu_util() the current policy is only goto max if uclamp isn't
in use at all, including cfs.

The write-up looks good to me.

Reviewed-by: Morten Rasmussen 

Morten


Re: [PATCH] sched: Add schedutil overview

2020-12-18 Thread Valentin Schneider


Hi,

Have some more nits below

On 18/12/20 10:32, Peter Zijlstra wrote:
> Signed-off-by: Peter Zijlstra (Intel) 
> ---
>  Documentation/scheduler/schedutil.txt |  168 
> ++
>  1 file changed, 168 insertions(+)
>
> --- /dev/null
> +++ b/Documentation/scheduler/schedutil.txt
[...]
> +Frequency- / CPU Invariance
> +---
> +
> +Because consuming the CPU for 50% at 1GHz is not the same as consuming the 
> CPU
> +for 50% at 2GHz, nor is running 50% on a LITTLE CPU the same as running 50% 
> on
> +a big CPU, we allow architectures to scale the time delta with two ratios, 
> one
> +Dynamic Voltage and Frequency Scaling (DVFS) ratio and one microarch ratio.
> +
> +For simple DVFS architectures (where software is in full control) we 
> trivially
> +compute the ratio as:
> +
> + f_cur
> +  r_dvfs := -
> +f_max
> +
> +For more dynamic systems where the hardware is in control of DVFS (Intel,
> +ARMv8.4-AMU) we use hardware counters to provide us this ratio. For Intel

Nit: To me this reads as if the presence of AMUs entail 'hardware is in
control of DVFS', which doesn't seem right. How about:

  For more dynamic systems where the hardware is in control of DVFS we use
  hardware counters (Intel APERF/MPERF, ARMv8.4-AMU) to provide us this
  ratio.

> +Schedutil / DVFS
> +
> +
> +Every time the scheduler load tracking is updated (task wakeup, task
> +migration, time progression) we call out to schedutil to update the hardware
> +DVFS state.
> +
> +The basis is the CPU runqueue's 'running' metric, which per the above it is
> +the frequency invariant utilization estimate of the CPU. From this we compute
> +a desired frequency like:
> +
> + max( running, util_est );   if UTIL_EST
> +  u_cfs := { running;otherwise
> +
> +  u_clamp := clamp( u_cfs, u_min, u_max )
> +
> +  u := u_cfs + u_rt + u_irq + u_dl;  [approx. see source for more detail]
> +
> +  f_des := min( f_max, 1.25 u * f_max )
> +

In schedutil_cpu_util(), uclamp clamps both u_cfs and u_rt. I'm afraid the
below might just bring more confusion; what do you think?

   clamp( u_cfs + u_rt, u_min, u_max );  if UCLAMP_TASK
  u_clamp := { u_cfs + u_rt; otherwise

  u := u_clamp + u_irq + u_dl;  [approx. see source for more detail]

(also, does this need a word about runnable rt tasks => goto max?)

> +XXX IO-wait; when the update is due to a task wakeup from IO-completion we
> +boost 'u' above.
> +
> +This frequency is then used to select a P-state/OPP or directly munged into a
> +CPPC style request to the hardware.
> +
> +XXX: deadline tasks (Sporadic Task Model) allows us to calculate a hard f_min
> +required to satisfy the workload.
> +
> +Because these callbacks are directly from the scheduler, the DVFS hardware
> +interaction should be 'fast' and non-blocking. Schedutil supports
> +rate-limiting DVFS requests for when hardware interaction is slow and
> +expensive, this reduces effectiveness.
> +
> +For more information see: kernel/sched/cpufreq_schedutil.c
> +