On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote: > Hello, > > On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote: > > And no test failures from yesterday evening. So it looks like we get > > somewhere on the order of one failure per 138 hours of TREE07 rcutorture > > runtime with your printk() in the mix. > > > > Was the above output from your printk() output of any help? > > Yeah, if my suspicion is correct, it'd require new kworker creation > racing against CPU offline, which would explain why it's so difficult > to repro. Can you please see whether the following patch resolves the > issue?
That could explain why only Steve Rostedt and I saw the issue. As far as I know, we are the only ones who regularly run CPU-hotplug stress tests. ;-) I have a weekend-long run going, but will give this a shot overnight on Monday, Pacific Time. Thank you for putting it together, looking forward to seeing what it does! Thanx, Paul > Thanks. > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > index 803c3bc274c4..1500217ce4b4 100644 > --- a/kernel/sched/core.c > +++ b/kernel/sched/core.c > @@ -980,8 +980,13 @@ struct migration_arg { > static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf, > struct task_struct *p, int dest_cpu) > { > - if (unlikely(!cpu_active(dest_cpu))) > - return rq; > + if (p->flags & PF_KTHREAD) { > + if (unlikely(!cpu_online(dest_cpu))) > + return rq; > + } else { > + if (unlikely(!cpu_active(dest_cpu))) > + return rq; > + } > > /* Affinity changed (again). */ > if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed)) >