On Thu, 2018-05-17 at 17:04 +0200, Juri Lelli wrote: > On 17/05/18 12:59, Juri Lelli wrote: > > On 16/05/18 18:31, Juri Lelli wrote: > > > On 16/05/18 17:47, Peter Zijlstra wrote: > > > > On Wed, May 16, 2018 at 05:19:25PM +0200, Juri Lelli wrote: > > > > > > > > > Anyway, FWIW I started testing this on a E5-2609 v3 and I'm > > > > > not seeing > > > > > hackbench regressions so far (running with schedutil > > > > > governor). > > > > > > > > https://en.wikipedia.org/wiki/Haswell_(microarchitecture)#Serve > > > > r_processors > > > > > > > > Lists the E5 2609 v3 as not having turbo at all, which is > > > > basically a > > > > best case scenario for this patch. > > > > > > > > As I wrote earlier today; when turbo exists, like say the 2699, > > > > then > > > > when we're busy we'll run at U=2.3/3.6 ~ .64, which might > > > > confuse > > > > things. > > > > > > Indeed. I was mostly trying to see if adding this to the tick > > > might > > > introduce noticeable overhead. > > > > Blindly testing on an i5-5200U (2.2/2.7 GHz) gave the following > > > > # perf bench sched messaging --pipe --thread --group 2 --loop 20000 > > > > count mean std min 50% > > 95% 99% max > > hostname > > kernel > > > > i5-5200U > > test_after 30.0 13.843433 0.590605 12.369 13.810 14.85635 > > 15.08205 15.127 > > test_before 30.0 13.571167 0.999798 12.228 13.302 1 > > 5.57805 16.40029 16.690 > > > > It might be interesting to see what happens when using a single CPU > > only? > > > > Also, I will look at how the util signals look when a single CPU is > > busy.. > > And this is showing where the problem is (as you were saying [1]): > > https://gist.github.com/jlelli/f5438221186e5ed3660194e4f645fe93 > > Just look at the plots (and ignore setup). > > First one (pid:4483) shows a single task busy running on a single > CPU, > which seems to be able to sustain turbo for 5 sec. So task util > reaches > ~1024. > > Second one (pid:4283) shows the same task, but running together with > other 3 tasks (each one pinned to a different CPU). In this case util > saturates at ~943, which is due to the fact that max freq is still > considered to be the turbo one. :/
One more point to note. Even if we calculate some utilization based on the freq-invariant and arrive at a P-state, we will not be able to control any P-state in turbo region (not even as a cap) on several Intel processors using PERF_CTL MSRs. > > [1] https://marc.info/?l=linux-kernel&m=152646464017810&w=2