Hello Stanislaw, On Fri, 2016-08-12 at 14:10 +0200, Stanislaw Gruszka wrote: > > I measured (partial) revert performance on 4.7 using mmtest instructions > from Giovanni and also tested some other possible fix (draft version): > > diff --git a/kernel/sched/cputime.c b/kernel/sched/cputime.c > index 75f98c5..54fdf6d 100644 > --- a/kernel/sched/cputime.c > +++ b/kernel/sched/cputime.c > @@ -294,6 +294,8 @@ void thread_group_cputime(struct task_struct *tsk, struct > task_cputime *times) > unsigned int seq, nextseq; > unsigned long flags; > > + (void) task_sched_runtime(tsk); > + > rcu_read_lock(); > /* Attempt a lockless read on the first round. */ > nextseq = 0; > @@ -308,7 +310,7 @@ void thread_group_cputime(struct task_struct *tsk, struct > task_cputime *times) > task_cputime(t, &utime, &stime); > times->utime += utime; > times->stime += stime; > - times->sum_exec_runtime += task_sched_runtime(t); > + times->sum_exec_runtime += t->se.sum_exec_runtime; > } > /* If lockless access failed, take the lock. */ > nextseq = 1; > --- > mmtest benchmark results are below (full compare-kernels.sh output is in > attachment): > > vanila-4.7 revert prefetch patch > 4.74 ( 0.00%) 3.04 ( 35.93%) 4.09 ( 13.81%) 1.30 ( > 72.59%) > 5.49 ( 0.00%) 5.00 ( 8.97%) 5.34 ( 2.72%) 1.03 ( > 81.16%) > 6.12 ( 0.00%) 4.91 ( 19.73%) 5.97 ( 2.40%) 0.90 ( > 85.27%) > 6.68 ( 0.00%) 4.90 ( 26.66%) 6.02 ( 9.75%) 0.88 ( > 86.89%) > 7.21 ( 0.00%) 5.13 ( 28.85%) 6.70 ( 7.09%) 0.87 ( > 87.91%) > 7.66 ( 0.00%) 5.22 ( 31.80%) 7.17 ( 6.39%) 0.92 ( > 88.01%) > 7.91 ( 0.00%) 5.36 ( 32.22%) 7.30 ( 7.72%) 0.95 ( > 87.97%) > 7.95 ( 0.00%) 5.35 ( 32.73%) 7.34 ( 7.66%) 1.06 ( > 86.66%) > 8.00 ( 0.00%) 5.33 ( 33.31%) 7.38 ( 7.73%) 1.13 ( > 85.82%) > 5.61 ( 0.00%) 3.55 ( 36.76%) 4.53 ( 19.23%) 2.29 ( > 59.28%) > 5.66 ( 0.00%) 4.32 ( 23.79%) 4.75 ( 16.18%) 3.65 ( > 35.46%) > 5.98 ( 0.00%) 4.97 ( 16.87%) 5.96 ( 0.35%) 3.62 ( > 39.40%) > 6.58 ( 0.00%) 4.94 ( 24.93%) 6.04 ( 8.32%) 3.63 ( > 44.89%) > 7.19 ( 0.00%) 5.18 ( 28.01%) 6.68 ( 7.13%) 3.65 ( > 49.22%) > 7.67 ( 0.00%) 5.27 ( 31.29%) 7.16 ( 6.63%) 3.62 ( > 52.76%) > 7.88 ( 0.00%) 5.36 ( 31.98%) 7.28 ( 7.58%) 3.65 ( > 53.71%) > 7.99 ( 0.00%) 5.39 ( 32.52%) 7.40 ( 7.42%) 3.65 ( > 54.25%) > > Patch works because we we update sum_exec_runtime on current thread > what assure we see proper sum_exec_runtime value on different CPUs. I > tested it with reproducers from commits 6e998916dfe32 and d670ec13178d0, > patch did not break them. I'm going to run some other test. > > Patch is draft version for early review, task_sched_runtime() will be > simplified (since it's called only current thread) and possibly split > into two functions: one that call update_curr() and other that return > sum_exec_runtime (assure it's consistent on 32 bit arches). > > Stanislaw
Thank you for having a look at this. Your patch performs very well, even better than the pre-6e998916dfe3 numbers I was aiming for. I confirm your results on my test machine (Sandy Bridge, 32 cores, 2 NUMA nodes). I didn't apply on the very latest 4.8-rc but used what I had handy for comparison (i.e. 4.7-rc7 and the parent of 6e998916dfe3). As I said, my measurements match yours (my tables follow); looks like your diff cures the problem while mine cures the symptoms. clock_gettime(): threads 4.7-rc7 3.18-rc3 4.7-rc7 + prefetch 4.7-rc7 + Stanislaw (pre-6e998916dfe3) 2 3.48 2.23 ( 35.68%) 3.06 ( 11.83%) 1.08 ( 68.81%) 5 3.33 2.83 ( 14.84%) 3.25 ( 2.40%) 0.71 ( 78.55%) 8 3.37 2.84 ( 15.80%) 3.26 ( 3.30%) 0.56 ( 83.49%) 12 3.32 3.09 ( 6.69%) 3.37 ( -1.60%) 0.42 ( 87.28%) 21 4.01 3.14 ( 21.70%) 3.90 ( 2.74%) 0.35 ( 91.35%) 30 3.63 3.28 ( 9.75%) 3.36 ( 7.41%) 0.28 ( 92.23%) 48 3.71 3.02 ( 18.69%) 3.11 ( 16.27%) 0.39 ( 89.39%) 79 3.75 2.88 ( 23.23%) 3.16 ( 15.74%) 0.46 ( 87.76%) 110 3.81 2.95 ( 22.62%) 3.25 ( 14.80%) 0.56 ( 85.41%) 128 3.88 3.05 ( 21.28%) 3.31 ( 14.76%) 0.62 ( 84.10%) times(): threads 4.7-rc7 3.18-rc3 4.7-rc7 + prefetch 4.7-rc7 + Stanislaw (pre-6e998916dfe3) 2 3.65 2.27 ( 37.94%) 3.25 ( 11.03%) 1.62 ( 55.71%) 5 3.45 2.78 ( 19.34%) 3.17 ( 7.92%) 2.33 ( 32.28%) 8 3.52 2.79 ( 20.66%) 3.22 ( 8.69%) 2.06 ( 41.44%) 12 3.29 3.02 ( 8.33%) 3.36 ( -2.04%) 2.00 ( 39.18%) 21 4.07 3.10 ( 23.86%) 3.92 ( 3.78%) 2.07 ( 49.18%) 30 3.87 3.33 ( 13.80%) 3.40 ( 12.17%) 1.89 ( 51.12%) 48 3.79 2.96 ( 21.94%) 3.16 ( 16.61%) 1.69 ( 55.46%) 79 3.88 2.88 ( 25.82%) 3.28 ( 15.42%) 1.60 ( 58.81%) 110 3.90 2.98 ( 23.73%) 3.38 ( 13.35%) 1.73 ( 55.61%) 128 4.00 3.10 ( 22.40%) 3.38 ( 15.45%) 1.66 ( 58.52%) Regards, Giovanni