On 3 March 2015 at 14:45, Tejun Heo <[email protected]> wrote: > Hello, > > Found the culprit. Plain wake_up() shouldn't be used on > bit_waitqueues. The following patch should fix the issue. I replaced > the patch in the wq branches.
Yup, this looks good from here. Thanks, Tomeu > Thanks a lot. > > ----- 8< ----- > From bba5a377507efef69214108c0d3790cadfab21d4 Mon Sep 17 00:00:00 2001 > From: Tejun Heo <[email protected]> > Date: Tue, 3 Mar 2015 08:43:09 -0500 > > cancel[_delayed]_work_sync() are implemented using > __cancel_work_timer() which grabs the PENDING bit using > try_to_grab_pending() and then flushes the work item with PENDING set > to prevent the on-going execution of the work item from requeueing > itself. > > try_to_grab_pending() can always grab PENDING bit without blocking > except when someone else is doing the above flushing during > cancelation. In that case, try_to_grab_pending() returns -ENOENT. In > this case, __cancel_work_timer() currently invokes flush_work(). The > assumption is that the completion of the work item is what the other > canceling task would be waiting for too and thus waiting for the same > condition and retrying should allow forward progress without excessive > busy looping > > Unfortunately, this doesn't work if preemption is disabled or the > latter task has real time priority. Let's say task A just got woken > up from flush_work() by the completion of the target work item. If, > before task A starts executing, task B gets scheduled and invokes > __cancel_work_timer() on the same work item, its try_to_grab_pending() > will return -ENOENT as the work item is still being canceled by task A > and flush_work() will also immediately return false as the work item > is no longer executing. This puts task B in a busy loop possibly > preventing task A from executing and clearing the canceling state on > the work item leading to a hang. > > task A task B worker > > executing work > __cancel_work_timer() > try_to_grab_pending() > set work CANCELING > flush_work() > block for work completion > completion, wakes up A > __cancel_work_timer() > while (forever) { > try_to_grab_pending() > -ENOENT as work is being canceled > flush_work() > false as work is no longer executing > } > > This patch removes the possible hang by updating __cancel_work_timer() > to explicitly wait for clearing of CANCELING rather than invoking > flush_work() after try_to_grab_pending() fails with -ENOENT. The > explicit wait uses the matching bit waitqueue for the CANCELING bit. > > Link: http://lkml.kernel.org/g/[email protected] > > v2: v1 used wake_up() on bit_waitqueue() which leads to NULL deref if > the target bit waitqueue has wait_bit_queue's on it. Use > DEFINE_WAIT_BIT() and __wake_up_bit() instead. Reported by Tomeu > Vizoso. > > Signed-off-by: Tejun Heo <[email protected]> > Reported-by: Rabin Vincent <[email protected]> > Cc: Tomeu Vizoso <[email protected]> > Cc: [email protected] > Tested-by: Jesper Nilsson <[email protected]> > Tested-by: Rabin Vincent <[email protected]> > --- > include/linux/workqueue.h | 3 ++- > kernel/workqueue.c | 37 +++++++++++++++++++++++++++++++++---- > 2 files changed, 35 insertions(+), 5 deletions(-) > > diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h > index 74db135..f597846 100644 > --- a/include/linux/workqueue.h > +++ b/include/linux/workqueue.h > @@ -70,7 +70,8 @@ enum { > /* data contains off-queue information when !WORK_STRUCT_PWQ */ > WORK_OFFQ_FLAG_BASE = WORK_STRUCT_COLOR_SHIFT, > > - WORK_OFFQ_CANCELING = (1 << WORK_OFFQ_FLAG_BASE), > + __WORK_OFFQ_CANCELING = WORK_OFFQ_FLAG_BASE, > + WORK_OFFQ_CANCELING = (1 << __WORK_OFFQ_CANCELING), > > /* > * When a work item is off queue, its high bits point to the last > diff --git a/kernel/workqueue.c b/kernel/workqueue.c > index f288493..cfbae1d 100644 > --- a/kernel/workqueue.c > +++ b/kernel/workqueue.c > @@ -2730,17 +2730,37 @@ EXPORT_SYMBOL_GPL(flush_work); > > static bool __cancel_work_timer(struct work_struct *work, bool is_dwork) > { > + wait_queue_head_t *waitq = bit_waitqueue(&work->data, > + __WORK_OFFQ_CANCELING); > unsigned long flags; > int ret; > > do { > ret = try_to_grab_pending(work, is_dwork, &flags); > /* > - * If someone else is canceling, wait for the same event it > - * would be waiting for before retrying. > + * If someone else is already canceling, wait for it to > + * finish. flush_work() doesn't work for PREEMPT_NONE > + * because we may get scheduled between @work's completion > + * and the other canceling task resuming and clearing > + * CANCELING - flush_work() will return false immediately > + * as @work is no longer busy, try_to_grab_pending() will > + * return -ENOENT as @work is still being canceled and the > + * other canceling task won't be able to clear CANCELING as > + * we're hogging the CPU. > + * > + * Explicitly wait for completion using a bit waitqueue. > + * We can't use wait_on_bit() as the CANCELING bit may get > + * recycled to point to pwq if @work gets re-queued. > */ > - if (unlikely(ret == -ENOENT)) > - flush_work(work); > + if (unlikely(ret == -ENOENT)) { > + DEFINE_WAIT_BIT(wait, &work->data, > + __WORK_OFFQ_CANCELING); > + prepare_to_wait(waitq, &wait.wait, > + TASK_UNINTERRUPTIBLE); > + if (work_is_canceling(work)) > + schedule(); > + finish_wait(waitq, &wait.wait); > + } > } while (unlikely(ret < 0)); > > /* tell other tasks trying to grab @work to back off */ > @@ -2749,6 +2769,15 @@ static bool __cancel_work_timer(struct work_struct > *work, bool is_dwork) > > flush_work(work); > clear_work_data(work); > + > + /* > + * Paired with prepare_to_wait() above so that either > + * __wake_up_bit() sees busy waitq here or !work_is_canceling() is > + * visible there. > + */ > + smp_mb(); > + __wake_up_bit(waitq, &work->data, __WORK_OFFQ_CANCELING); > + > return ret; > } > > -- > 2.1.0 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

