On Monday, March 20, 2017 01:06:15 PM Patrick Bellasi wrote: > On 20-Mar 13:50, Peter Zijlstra wrote: > > On Mon, Mar 20, 2017 at 01:35:12PM +0100, Rafael J. Wysocki wrote: > > > On Monday, March 20, 2017 11:36:45 AM Peter Zijlstra wrote: > > > > On Sun, Mar 19, 2017 at 02:34:32PM +0100, Rafael J. Wysocki wrote: > > > > > From: Rafael J. Wysocki <rafael.j.wyso...@intel.com> > > > > > > > > > > The PELT metric used by the schedutil governor underestimates the > > > > > CPU utilization in some cases. The reason for that may be time spent > > > > > in interrupt handlers and similar which is not accounted for by PELT. > > > > > > > > > > That can be easily demonstrated by running kernel compilation on > > > > > a Sandy Bridge Intel processor, running turbostat in parallel with > > > > > it and looking at the values written to the MSR_IA32_PERF_CTL > > > > > register. Namely, the expected result would be that when all CPUs > > > > > were 100% busy, all of them would be requested to run in the maximum > > > > > P-state, but observation shows that this clearly isn't the case. > > > > > The CPUs run in the maximum P-state for a while and then are > > > > > requested to run slower and go back to the maximum P-state after > > > > > a while again. That causes the actual frequency of the processor to > > > > > visibly oscillate below the sustainable maximum in a jittery fashion > > > > > which clearly is not desirable. > > > > > > > > > > To work around this issue use the observation that, from the > > > > > schedutil governor's perspective, CPUs that are never idle should > > > > > always run at the maximum frequency and make that happen. > > > > > > > > > > To that end, add a counter of idle calls to struct sugov_cpu and > > > > > modify cpuidle_idle_call() to increment that counter every time it > > > > > is about to put the given CPU into an idle state. Next, make the > > > > > schedutil governor look at that counter for the current CPU every > > > > > time before it is about to start heavy computations. If the counter > > > > > has not changed for over SUGOV_BUSY_THRESHOLD time (equal to 50 ms), > > > > > the CPU has not been idle for at least that long and the governor > > > > > will choose the maximum frequency for it without looking at the PELT > > > > > metric at all. > > > > > > > > Why the time limit? > > > > > > One iteration appeared to be a bit too aggressive, but honestly I think > > > I need to check again if this thing is regarded as viable at all. > > > > > > > I don't hate the idea; if we don't hit idle; we shouldn't shift down. I > > just wonder if we don't already keep a idle-seqcount somewhere; NOHZ and > > RCU come to mind as things that might already use something like that. > > Maybe the problem is not going down (e.g. when there are only small > CFS tasks it makes perfectly sense) but instead not being fast enough > on rampin-up when a new RT task is activated. > > And this boils down to two main point: > 1) throttling for up transitions perhaps is only harmful > 2) the call sites for schedutils updates are not properly positioned > in specific scheduler decision points. > > The proposed patch is adding yet another throttling mechanism, perhaps > on top of one which already needs to be improved.
It is not throttling anything. Thanks, Rafael