Looks good to me. There is one test in show_pwq() """ worker == pwq->wq->rescuer ? "(RESCUER)" : "", """ I'm wondering if it needs to be updated to """ worker->rescue_wq ? "(RESCUER)" : "", """
And document "/* MD: rescue worker */" might be better than current "/* I: rescue worker */", although ->rescuer can be accessed without wq_mayday_lock lock in some code. Reviewed-by: Lai Jiangshan <jiangshan...@gmail.com> On Thu, Sep 19, 2019 at 9:43 AM Tejun Heo <t...@kernel.org> wrote: > > Before actually destrying a workqueue, destroy_workqueue() checks > whether it's actually idle. If it isn't, it prints out a bunch of > warning messages and leaves the workqueue dangling. It unfortunately > has a couple issues. > > * Mayday list queueing increments pwq's refcnts which gets detected as > busy and fails the sanity checks. However, because mayday list > queueing is asynchronous, this condition can happen without any > actual work items left in the workqueue. > > * Sanity check failure leaves the sysfs interface behind too which can > lead to init failure of newer instances of the workqueue. > > This patch fixes the above two by > > * If a workqueue has a rescuer, disable and kill the rescuer before > sanity checks. Disabling and killing is guaranteed to flush the > existing mayday list. > > * Remove sysfs interface before sanity checks. > > Signed-off-by: Tejun Heo <t...@kernel.org> > Reported-by: Marcin Pawlowski <mpawlow...@fb.com> > Reported-by: "Williams, Gerald S" <gerald.s.willi...@intel.com> > Cc: sta...@vger.kernel.org > --- > Applying to wq/for-5.4. > > Thanks. > > kernel/workqueue.c | 24 +++++++++++++++++++----- > 1 file changed, 19 insertions(+), 5 deletions(-) > > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index 95aea04ff722..73e3ea9e31d3 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -4318,9 +4318,28 @@ void destroy_workqueue(struct workqueue_struct *wq) > struct pool_workqueue *pwq; > int node; > > + /* > + * Remove it from sysfs first so that sanity check failure doesn't > + * lead to sysfs name conflicts. > + */ > + workqueue_sysfs_unregister(wq); > + > /* drain it before proceeding with destruction */ > drain_workqueue(wq); > > + /* kill rescuer, if sanity checks fail, leave it w/o rescuer */ > + if (wq->rescuer) { > + struct worker *rescuer = wq->rescuer; > + > + /* this prevents new queueing */ > + spin_lock_irq(&wq_mayday_lock); > + wq->rescuer = NULL; > + spin_unlock_irq(&wq_mayday_lock); > + > + /* rescuer will empty maydays list before exiting */ > + kthread_stop(rescuer->task); > + } > + > /* sanity checks */ > mutex_lock(&wq->mutex); > for_each_pwq(pwq, wq) { > @@ -4352,11 +4371,6 @@ void destroy_workqueue(struct workqueue_struct *wq) > list_del_rcu(&wq->list); > mutex_unlock(&wq_pool_mutex); > > - workqueue_sysfs_unregister(wq); > - > - if (wq->rescuer) > - kthread_stop(wq->rescuer->task); > - > if (!(wq->flags & WQ_UNBOUND)) { > wq_unregister_lockdep(wq); > /* > -- > 2.17.1 >