On Sat, Jun 17, 2017 at 07:53:14AM -0400, Tejun Heo wrote:
> Hello,
> 
> On Fri, Jun 16, 2017 at 10:36:58AM -0700, Paul E. McKenney wrote:
> > And no test failures from yesterday evening.  So it looks like we get
> > somewhere on the order of one failure per 138 hours of TREE07 rcutorture
> > runtime with your printk() in the mix.
> >
> > Was the above output from your printk() output of any help?
> 
> Yeah, if my suspicion is correct, it'd require new kworker creation
> racing against CPU offline, which would explain why it's so difficult
> to repro.  Can you please see whether the following patch resolves the
> issue?

That could explain why only Steve Rostedt and I saw the issue.  As far
as I know, we are the only ones who regularly run CPU-hotplug stress
tests.  ;-)

I have a weekend-long run going, but will give this a shot overnight on
Monday, Pacific Time.  Thank you for putting it together, looking forward
to seeing what it does!

                                                        Thanx, Paul

> Thanks.
> 
> diff --git a/kernel/sched/core.c b/kernel/sched/core.c
> index 803c3bc274c4..1500217ce4b4 100644
> --- a/kernel/sched/core.c
> +++ b/kernel/sched/core.c
> @@ -980,8 +980,13 @@ struct migration_arg {
>  static struct rq *__migrate_task(struct rq *rq, struct rq_flags *rf,
>                                struct task_struct *p, int dest_cpu)
>  {
> -     if (unlikely(!cpu_active(dest_cpu)))
> -             return rq;
> +     if (p->flags & PF_KTHREAD) {
> +             if (unlikely(!cpu_online(dest_cpu)))
> +                     return rq;
> +     } else {
> +             if (unlikely(!cpu_active(dest_cpu)))
> +                     return rq;
> +     }
> 
>       /* Affinity changed (again). */
>       if (!cpumask_test_cpu(dest_cpu, &p->cpus_allowed))
> 

Reply via email to