On Tue, Jun 13, 2017 at 04:58:37PM -0400, Tejun Heo wrote: > Hello, Paul. > > On Fri, May 05, 2017 at 10:11:59AM -0700, Paul E. McKenney wrote: > > Just following up... I have hit this bug a couple of times over the > > past few days. Anything I can do to help? > > My apologies for dropping the ball on this. I've gone over the hot > plug code in workqueue several times but can't really find how this > would happen. Can you please apply the following patch and see what > it says when the problem happens?
I have fired it up, thank you! Last time I saw one failure in 21 hours of test runs, so I have kicked of 42 one-hour test runs. Will see what happens tomorrow morning, Pacific Time. Thanx, Paul > Thanks. > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index c74bf39ef764..bd2ce3cbfb41 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -1691,13 +1691,20 @@ static struct worker *alloc_worker(int node) > static void worker_attach_to_pool(struct worker *worker, > struct worker_pool *pool) > { > + int ret; > + > mutex_lock(&pool->attach_mutex); > > /* > * set_cpus_allowed_ptr() will fail if the cpumask doesn't have any > * online CPUs. It'll be re-applied when any of the CPUs come up. > */ > - set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + ret = set_cpus_allowed_ptr(worker->task, pool->attrs->cpumask); > + > + WARN(ret && !(pool->flags & POOL_DISASSOCIATED), > + "set_cpus_allowed_ptr failed, ret=%d pool->cpu/flags=%d/0x%x > cpumask=%*pbl online=%*pbl active=%*pbl\n", > + ret, pool->cpu, pool->flags, cpumask_pr_args(pool->attrs->cpumask), > + cpumask_pr_args(cpu_online_mask), > cpumask_pr_args(cpu_active_mask)); > > /* > * The pool->attach_mutex ensures %POOL_DISASSOCIATED remains > @@ -2037,8 +2044,11 @@ __acquires(&pool->lock) > lockdep_copy_map(&lockdep_map, &work->lockdep_map); > #endif > /* ensure we're on the correct CPU */ > - WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && > - raw_smp_processor_id() != pool->cpu); > + if (WARN_ON_ONCE(!(pool->flags & POOL_DISASSOCIATED) && > + raw_smp_processor_id() != pool->cpu)) > + printk_once("XXX workfn=%pf pool->cpu/flags=%d/0x%x curcpu=%d > online=%*pbl active=%*pbl\n", > + work->func, pool->cpu, pool->flags, > raw_smp_processor_id(), > + cpumask_pr_args(cpu_online_mask), > cpumask_pr_args(cpu_active_mask)); > > /* > * A single work shouldn't be executed concurrently by >