On 16-07-17, 01:04, Joel Fernandes wrote: > Currently the iowait_boost feature in schedutil makes the frequency go to max > on iowait wakeups. This feature was added to handle a case that Peter > described where the throughput of operations involving continuous I/O requests > [1] is reduced due to running at a lower frequency, however the lower > throughput itself causes utilization to be low and hence causing frequency to > be low hence its "stuck". > > Instead of going to max, its also possible to achieve the same effect by > ramping up to max if there are repeated in_iowait wakeups happening. This > patch > is an attempt to do that. We start from a lower frequency (policy->mind)
s/mind/min/ > and double the boost for every consecutive iowait update until we reach the > maximum iowait boost frequency (iowait_boost_max). > > I ran a synthetic test (continuous O_DIRECT writes in a loop) on an x86 > machine > with intel_pstate in passive mode using schedutil. In this test the > iowait_boost > value ramped from 800MHz to 4GHz in 60ms. The patch achieves the desired > improved > throughput as the existing behavior. > > Also while at it, make iowait_boost and iowait_boost_max as unsigned int since > its unit is kHz and this is consistent with struct cpufreq_policy. > > [1] https://patchwork.kernel.org/patch/9735885/ > > Cc: Srinivas Pandruvada <[email protected]> > Cc: Len Brown <[email protected]> > Cc: Rafael J. Wysocki <[email protected]> > Cc: Viresh Kumar <[email protected]> > Cc: Ingo Molnar <[email protected]> > Cc: Peter Zijlstra <[email protected]> > Suggested-by: Peter Zijlstra <[email protected]> > Signed-off-by: Joel Fernandes <[email protected]> > --- > This version is based on some ideas from Viresh and Juri in v4. Viresh, one > difference between the idea we just discussed is, I am scaling up/down the > boost only after consuming it. This has the effect of slightly delaying the > "deboost" but achieves the same boost ramp time. Its more cleaner in the code > IMO to avoid the scaling up and then down on the initial boost. Note that I > also dropped iowait_boost_min and now I'm just starting the initial boost from > policy->min since as I mentioned in the commit above, the ramp of the > iowait_boost value is very quick and for the usecase its intended for, it > works > fine. Hope this is acceptable. Thanks. > > kernel/sched/cpufreq_schedutil.c | 31 +++++++++++++++++++++++-------- > 1 file changed, 23 insertions(+), 8 deletions(-) > > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index 622eed1b7658..4225bbada88d 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -53,8 +53,9 @@ struct sugov_cpu { > struct update_util_data update_util; > struct sugov_policy *sg_policy; > > - unsigned long iowait_boost; > - unsigned long iowait_boost_max; > + bool iowait_boost_pending; > + unsigned int iowait_boost; > + unsigned int iowait_boost_max; > u64 last_update; > > /* The fields below are only needed when sharing a policy. */ > @@ -172,30 +173,43 @@ static void sugov_set_iowait_boost(struct sugov_cpu > *sg_cpu, u64 time, > unsigned int flags) > { > if (flags & SCHED_CPUFREQ_IOWAIT) { > - sg_cpu->iowait_boost = sg_cpu->iowait_boost_max; > + sg_cpu->iowait_boost_pending = true; > + sg_cpu->iowait_boost = max(sg_cpu->iowait_boost, > + sg_cpu->sg_policy->policy->min); > } else if (sg_cpu->iowait_boost) { > s64 delta_ns = time - sg_cpu->last_update; > > /* Clear iowait_boost if the CPU apprears to have been idle. */ > - if (delta_ns > TICK_NSEC) > + if (delta_ns > TICK_NSEC) { > sg_cpu->iowait_boost = 0; > + sg_cpu->iowait_boost_pending = false; > + } We don't really need to clear this flag here as we are already making iowait_boost as 0 and that's what we check while using boost. > } > } > > static void sugov_iowait_boost(struct sugov_cpu *sg_cpu, unsigned long *util, > unsigned long *max) > { > - unsigned long boost_util = sg_cpu->iowait_boost; > - unsigned long boost_max = sg_cpu->iowait_boost_max; > + unsigned long boost_util, boost_max; > > - if (!boost_util) > + if (!sg_cpu->iowait_boost) > return; > > + boost_util = sg_cpu->iowait_boost; > + boost_max = sg_cpu->iowait_boost_max; > + The above changes are not required anymore (and were required only with my patch). > if (*util * boost_max < *max * boost_util) { > *util = boost_util; > *max = boost_max; > } > - sg_cpu->iowait_boost >>= 1; > + > + if (sg_cpu->iowait_boost_pending) { > + sg_cpu->iowait_boost_pending = false; > + sg_cpu->iowait_boost = min(sg_cpu->iowait_boost << 1, > + sg_cpu->iowait_boost_max); Now this has a problem. We will also boost after waiting for rate_limit_us. And that's why I had proposed the tricky solution in the first place. I thought we wanted to avoid instant boost only for the first iteration, but after that we wanted to do it ASAP. Isn't it? Now that you are using policy->min instead of policy->cur, we can simplify the solution I proposed and always do 2 * iowait_boost before getting current util/max in above if loop. i.e. we will start iowait boost with min * 2 instead of min and that should be fine. > + } else { > + sg_cpu->iowait_boost >>= 1; > + } > } > > #ifdef CONFIG_NO_HZ_COMMON > @@ -267,6 +281,7 @@ static unsigned int sugov_next_freq_shared(struct > sugov_cpu *sg_cpu, u64 time) > delta_ns = time - j_sg_cpu->last_update; > if (delta_ns > TICK_NSEC) { > j_sg_cpu->iowait_boost = 0; > + j_sg_cpu->iowait_boost_pending = false; Not required here as well. > continue; > } > if (j_sg_cpu->flags & SCHED_CPUFREQ_RT_DL) > -- > 2.13.2.932.g7449e964c-goog -- viresh

