On Thu, Jun 28, 2018 at 05:45:03PM +0200, Vincent Guittot wrote: > Vincent Guittot (11): > sched/pelt: Move pelt related code in a dedicated file > sched/rt: add rt_rq utilization tracking > cpufreq/schedutil: use rt utilization tracking > sched/dl: add dl_rq utilization tracking > cpufreq/schedutil: use dl utilization tracking > sched/irq: add irq utilization tracking > cpufreq/schedutil: take into account interrupt > sched: schedutil: remove sugov_aggregate_util() > sched: use pelt for scale_rt_capacity() > sched: remove rt_avg code > proc/sched: remove unused sched_time_avg_ms > > include/linux/sched/sysctl.h | 1 - > kernel/sched/Makefile | 2 +- > kernel/sched/core.c | 38 +--- > kernel/sched/cpufreq_schedutil.c | 65 ++++--- > kernel/sched/deadline.c | 8 +- > kernel/sched/fair.c | 403 > +++++---------------------------------- > kernel/sched/pelt.c | 399 ++++++++++++++++++++++++++++++++++++++ > kernel/sched/pelt.h | 72 +++++++ > kernel/sched/rt.c | 15 +- > kernel/sched/sched.h | 68 +++++-- > kernel/sysctl.c | 8 - > 11 files changed, 632 insertions(+), 447 deletions(-) > create mode 100644 kernel/sched/pelt.c > create mode 100644 kernel/sched/pelt.h
OK, this looks good I suppose. Rafael, are you OK with me taking these? I have the below on top because I once again forgot how it all worked; does this work for you Vincent? --- Subject: sched/cpufreq: Clarify sugov_get_util() Add a few comments (hopefully) clarifying some of the magic in sugov_get_util(). Signed-off-by: Peter Zijlstra (Intel) <[email protected]> --- cpufreq_schedutil.c | 69 ++++++++++++++++++++++++++++++++++++++-------------- 1 file changed, 51 insertions(+), 18 deletions(-) --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -177,6 +177,26 @@ static unsigned int get_next_freq(struct return cpufreq_driver_resolve_freq(policy, freq); } +/* + * This function computes an effective utilization for the given CPU, to be + * used for frequency selection given the linear relation: f = u * f_max. + * + * The scheduler tracks the following metrics: + * + * cpu_util_{cfs,rt,dl,irq}() + * cpu_bw_dl() + * + * Where the cfs,rt and dl util numbers are tracked with the same metric and + * synchronized windows and are thus directly comparable. + * + * The cfs,rt,dl utilization are the running times measured with rq->clock_task + * which excludes things like IRQ and steal-time. These latter are then accrued in + * the irq utilization. + * + * The DL bandwidth number otoh is not a measured meric but a value computed + * based on the task model parameters and gives the minimal u required to meet + * deadlines. + */ static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) { struct rq *rq = cpu_rq(sg_cpu->cpu); @@ -188,26 +208,50 @@ static unsigned long sugov_get_util(stru if (rt_rq_is_runnable(&rq->rt)) return max; + /* + * Early check to see if IRQ/steal time saturates the CPU, can be + * because of inaccuracies in how we track these -- see + * update_irq_load_avg(). + */ irq = cpu_util_irq(rq); - if (unlikely(irq >= max)) return max; - /* Sum rq utilization */ + /* + * Because the time spend on RT/DL tasks is visible as 'lost' time to + * CFS tasks and we use the same metric to track the effective + * utilization (PELT windows are synchronized) we can directly add them + * to obtain the CPU's actual utilization. + */ util = cpu_util_cfs(rq); util += cpu_util_rt(rq); /* - * Interrupt time is not seen by rqs utilization nso we can compare - * them with the CPU capacity + * We do not make cpu_util_dl() a permanent part of this sum because we + * want to use cpu_bw_dl() later on, but we need to check if the + * CFS+RT+DL sum is saturated (ie. no idle time) such that we select + * f_max when there is no idle time. + * + * NOTE: numerical errors or stop class might cause us to not quite hit + * saturation when we should -- something for later. */ if ((util + cpu_util_dl(rq)) >= max) return max; /* - * As there is still idle time on the CPU, we need to compute the - * utilization level of the CPU. + * There is still idle time; further improve the number by using the + * irq metric. Because IRQ/steal time is hidden from the task clock we + * need to scale the task numbers: * + * 1 - irq + * U' = irq + ------- * U + * max + */ + util *= (max - irq); + util /= max; + util += irq; + + /* * Bandwidth required by DEADLINE must always be granted while, for * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism * to gracefully reduce the frequency when no tasks show up for longer @@ -217,18 +261,7 @@ static unsigned long sugov_get_util(stru * util_cfs + util_dl as requested freq. However, cpufreq is not yet * ready for such an interface. So, we only do the latter for now. */ - - /* Weight rqs utilization to normal context window */ - util *= (max - irq); - util /= max; - - /* Add interrupt utilization */ - util += irq; - - /* Add DL bandwidth requirement */ - util += sg_cpu->bw_dl; - - return min(max, util); + return min(max, util + sg_cpu->bw_dl); } /**

