On 10/13/2016 06:58 AM, Vincent Guittot wrote: > Hi, > > On 12 October 2016 at 18:21, Joseph Salisbury > <[email protected]> wrote: >> On 10/12/2016 08:20 AM, Vincent Guittot wrote: >>> On 8 October 2016 at 13:49, Mike Galbraith <[email protected]> wrote: >>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote: >>>>> On 8 October 2016 at 10:39, Ingo Molnar <[email protected]> wrote: >>>>>> * Peter Zijlstra <[email protected]> wrote: >>>>>> >>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote: >>>>>>>> Hello Peter, >>>>>>>> >>>>>>>> A kernel bug report was opened against Ubuntu [0]. After a >>>>>>>> kernel >>>>>>>> bisect, it was found that reverting the following commit >>>>>>>> resolved this bug: >>>>>>>> >>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5 >>>>>>>> Author: Peter Zijlstra <[email protected]> >>>>>>>> Date: Tue Jun 21 14:27:50 2016 +0200 >>>>>>>> >>>>>>>> sched/fair: Apply more PELT fixes >>>>> This patch only speeds up the update of task group load in order to >>>>> reflect the new load balance but It should not change the final value >>>>> and as a result the final behavior. I will try to reproduce it in my >>>>> target later today >>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master. >>> Me too >>> >>> Is it possible to get some dump of /proc/sched_debug while the problem >>> occurs ? >>> >>> Vincent >>> >>>> -Mike >> The output from /proc/shed_debug can be seen here: >> http://paste.ubuntu.com/23312351/ > I have looked at the dump and there is something very odd for > system.slice task group where the display manager is running. > system.slice->tg_load_avg is around 381697 but tg_load_avg is > normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib > whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our > case. We can have some differences because the dump of > /proc/shed_debug is not atomic and some changes can happen but nothing > like this difference. > > The main effect of this quite high value is that the weight/prio of > the sched_entity that represents system.slice in root cfs_rq is very > low (lower than task with the smallest nice prio) so the system.slice > task group will not get the CPU quite often compared to the user.slice > task group: less than 1% for the system.slice where lightDM and xorg > are running compared 99% for the user.slice where the stress tasks are > running. This is confirmed by the se->avg.util_avg value of the task > groups which reflect how much time each task group is effectively > running on a CPU: > system.slice[CPU3].se->avg.util_avg = 8 whereas > user.slice[CPU3].se->avg.util_avg = 991 > > This difference of weight/priority explains why the system becomes > unresponsive. For now, I can't explain is why > system.slice->tg_load_avg = 381697 whereas is should be around 1013 > and how the patch can generate this situation. > > Is it possible to have a dump of /proc/sched_debug before starting > stress command ? to check if the problem is there from the beginning > but not seen because not overloaded. Or if it the problem comes when > user starts to load the system Here is the dump before stress is started: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy
Here it is after: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy > > Thanks, > >> Ingo, the latest scheduler bits also still exhibit the bug: >> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git >> >>

