On Wed, 9 Jan 2019 at 23:43, Sargun Dhillon <[email protected]> wrote: > > On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <[email protected]> wrote: > > > > I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix > > infinite loop in update_blocked_averages() by reverting a9e7f6544b9c > > and put it on top of 4.19.13. In addition to this, I uninlined > > list_add_leaf_cfs_rq for debugging. > > > > This revealed a new bug that we didn't get to because we kept getting > > crashes from the previous issue. When we are running with cgroups that > > are rapidly changing, with CFS bandwidth control, and in addition > > using the cpusets cgroup, we see this crash. Specifically, it seems to > > occur with cgroups that are throttled and we change the allowed > > cpuset.
Thanks for the context, I will try to reproduce the problem and understand how we can stop in the middle of walking to the sched_entity branch with a parent not already added How many cgroup level have you got in you setup ? > > > > This patch from Gabriel should fix the problem: > > > [PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete > > When a child cfs_rq is added to the leaf cfs_rq list before its parent > tmp_alone_branch is set to point to the child in preparation for the > parent being added. > > If the child is deleted before the parent is added then tmp_alone_branch > points to a freed cfs_rq. Any future reference to tmp_alone_branch will > result in a use after free. So, the patch below is a temporary fix that helps to recover from the situation where tmp_alone_branch doesn't finished back to rq->leaf_cfs_rq_list But this situation should not happened at the beginning > > Signed-off-by: Gabriel Hartmann <[email protected]> > Reported-by: Sargun Dhillon <[email protected]> > --- > kernel/sched/fair.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c > index 7137bc343b4a..0987629cbb76 100644 > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct > cfs_rq *cfs_rq) > static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq) > { > if (cfs_rq->on_list) { > + struct rq *rq = rq_of(cfs_rq); > + > + if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list) > + rq->tmp_alone_branch = &rq->leaf_cfs_rq_list; > + > list_del_rcu(&cfs_rq->leaf_cfs_rq_list); > cfs_rq->on_list = 0; > }

