Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

Joseph Salisbury Thu, 13 Oct 2016 08:54:12 -0700

On 10/13/2016 06:58 AM, Vincent Guittot wrote:
> Hi,
>
> On 12 October 2016 at 18:21, Joseph Salisbury
> <[email protected]> wrote:
>> On 10/12/2016 08:20 AM, Vincent Guittot wrote:
>>> On 8 October 2016 at 13:49, Mike Galbraith <[email protected]> wrote:
>>>> On Sat, 2016-10-08 at 13:37 +0200, Vincent Guittot wrote:
>>>>> On 8 October 2016 at 10:39, Ingo Molnar <[email protected]> wrote:
>>>>>> * Peter Zijlstra <[email protected]> wrote:
>>>>>>
>>>>>>> On Fri, Oct 07, 2016 at 03:38:23PM -0400, Joseph Salisbury wrote:
>>>>>>>> Hello Peter,
>>>>>>>>
>>>>>>>> A kernel bug report was opened against Ubuntu [0].  After a
>>>>>>>> kernel
>>>>>>>> bisect, it was found that reverting the following commit
>>>>>>>> resolved this bug:
>>>>>>>>
>>>>>>>> commit 3d30544f02120b884bba2a9466c87dba980e3be5
>>>>>>>> Author: Peter Zijlstra <[email protected]>
>>>>>>>> Date:   Tue Jun 21 14:27:50 2016 +0200
>>>>>>>>
>>>>>>>>     sched/fair: Apply more PELT fixes
>>>>> This patch only speeds up the update of task group load in order to
>>>>> reflect the new load balance but It should not change the final value
>>>>> and as a result the final behavior. I will try to reproduce it in my
>>>>> target later today
>>>> FWIW, I tried and failed w/wo autogroup on 4.8 and master.
>>> Me too
>>>
>>> Is it possible to get some dump of  /proc/sched_debug while the problem 
>>> occurs ?
>>>
>>> Vincent
>>>
>>>>         -Mike
>> The output from /proc/shed_debug can be seen here:
>> http://paste.ubuntu.com/23312351/
> I have looked at the dump and there is something very odd for
> system.slice task group where the display manager is running.
> system.slice->tg_load_avg is around 381697 but  tg_load_avg is
> normally equal to Sum of system.slice[cpu]->tg_load_avg_contrib
> whereas Sum of system.slice[cpu]->tg_load_avg_contrib = 1013 in our
> case. We can have some differences because the dump of
> /proc/shed_debug is not atomic and some changes can happen but nothing
> like this difference.
>
> The main effect of this quite high value is that the weight/prio of
> the sched_entity that represents system.slice in root cfs_rq is very
> low (lower than task with the smallest nice prio) so the system.slice
> task group will not get the CPU quite often compared to the user.slice
> task group: less than 1% for the system.slice where lightDM and xorg
> are running compared 99% for the user.slice where the stress tasks are
> running. This is confirmed by the se->avg.util_avg value of the task
> groups which reflect how much time each task group is effectively
> running on a CPU:
> system.slice[CPU3].se->avg.util_avg = 8 whereas
> user.slice[CPU3].se->avg.util_avg = 991
>
> This difference of weight/priority explains why the system becomes
> unresponsive. For now, I can't explain is why
> system.slice->tg_load_avg = 381697 whereas is should be around 1013
> and how the patch can generate this situation.
>
> Is it possible to have a dump of /proc/sched_debug before starting
> stress command ? to check if the problem is there from the beginning
> but not seen because not overloaded. Or if it the problem comes when
> user starts to load the system
Here is the dump before stress is started:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760437/+files/dump_nonbuggy


Here it is after:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1627108/+attachment/4760436/+files/dump_buggy


>
> Thanks,
>
>> Ingo, the latest scheduler bits also still exhibit the bug:
>> git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git
>>
>>

Re: [v4.8-rc1 Regression] sched/fair: Apply more PELT fixes

Reply via email to