On Mon, 10 Apr 2017, Paul E. McKenney wrote:
> On Mon, Apr 10, 2017 at 12:20:53PM -0400, Alan Stern wrote:
> > On Mon, 10 Apr 2017, Paul E. McKenney wrote:
> >
> > > > But I would like to get this matter settled first. Is the explicit
> > > > barrier truly necessary?
> > >
> > > If you are using wait_event()/wake_up() or friends, the explicit
> > > barrier -is- necessary. To see this, look at v4.10's wait_event():
> > >
> > > #define wait_event(wq, condition) \
> > > do { \
> > > might_sleep(); \
> > > if (condition) \
> > > break; \
> > > __wait_event(wq, condition); \
> > > } while (0)
> > >
> > > As you can see, if the condition is set just before the wait_event()
> > > macro checks it, there is no ordering whatsoever.
> >
> > This is true, but it is not relevant to the question I was asking.
>
> Apologies! What I get for answering email too early on Monday, I guess...
>
> > > And if wake_up()
> > > finds nothing to wake up, there is no relevant ordering on that side,
> > > either.
> > >
> > > So you had better supply your own ordering, period, end of story.
> >
> > The question is: Exactly what ordering do I need to supply? The
> > ordering among my own variables is okay; I know how to deal with that.
> > But what about the ordering between my variables and current->state?
>
> The ordering with current->state is sadly not relevant because it is
> only touched if wake_up() actually wakes the process up.
Well, it is _written_ only if wake_up() actually wakes the process up.
But it is _read_ in every case.
> > For example, __wait_event() calls prepare_to_wait(), which calls
> > set_current_state(), which calls smp_store_mb(), thereby inserting a
> > full memory barrier between setting current->state and checking the
> > condition. But I didn't see any comparable barrier inserted by
> > wake_up(), between setting the condition and checking task->state.
> >
> > However, now that I look more closely, I do see that wakeup_process()
> > calls try_to_wake_up(), which begins with:
> >
> > /*
> > * If we are going to wake up a thread waiting for CONDITION we
> > * need to ensure that CONDITION=1 done by the caller can not be
> > * reordered with p->state check below. This pairs with mb() in
> > * set_current_state() the waiting thread does.
> > */
> > smp_mb__before_spinlock();
> > raw_spin_lock_irqsave(&p->pi_lock, flags);
> > if (!(p->state & state))
> >
> > So it does insert a full barrier after all, and there is nothing to
> > worry about.
>
> Nice!
It looks like the other wakeup pathways end up funnelling through
try_to_wake_up(), so this is true in general.
> Hmmm...
>
> Another valid (and I believe more common) idiom is this:
>
> spin_lock(&mylock);
> changes_that_must_be_visible_to_woken_thread();
> WRITE_ONCE(need_wake_up, true);
> spin_unlock(&mylock);
>
> ---
>
> wait_event(wq, READ_ONCE(need_wake_up));
> spin_lock(&mylock);
> access_variables_used_by_waking_thread();
> spin_unlock(&mylock);
>
> In this case, the locks do all the required ordering.
>
> > This also means that the analysis provided by Thinh Nguyen in the
> > original patch description is wrong.
>
> And that the bug is elsewhere?
Presumably. On the other hand, Thinh Nguyen claimed to have narrowed
the problem down to this particular mechanism. The driver in question
in drivers/usb/gadget/function/f_mass_storage.c. The waker routines
are bulk_out_complete()/wakeup_thread(), which do:
// bulk_out_complete()
bh->state = BH_STATE_FULL;
// wakeup_thread()
smp_wmb(); /* ensure the write of bh->state is complete */
/* Tell the main thread that something has happened */
common->thread_wakeup_needed = 1;
if (common->thread_task)
wake_up_process(common->thread_task);
and the waiters are get_next_command()/sleep_thread(), which do:
// get_next_command()
while (bh->state == BH_STATE_EMPTY) {
// sleep_thread()
for (;;) {
if (can_freeze)
try_to_freeze();
set_current_state(TASK_INTERRUPTIBLE);
if (signal_pending(current)) {
rc = -EINTR;
break;
}
if (common->thread_wakeup_needed)
break;
schedule();
}
__set_current_state(TASK_RUNNING);
common->thread_wakeup_needed = 0;
smp_rmb(); /* ensure the latest bh->state is visible */
}
and he said that the problem was caused by the waiter seeing
thread_wakeup_needed == 0, so the wakeup was getting lost.
Hmmm, I suppose it's possible that the waker's thread_wakeup_needed = 1
could race with the waiter's thread_wakeup_needed = 0. If there are
two waits in quick succession, the second could get lost. The pattern
would be:
bh->state = BH_STATE_FULL;
smp_wmb();
thread_wakeup_needed = 0; thread_wakeup_needed = 1;
smp_rmb();
if (bh->state != BH_STATE_FULL)
sleep again...
This is the so-called R pattern, and it also needs full memory barriers
on both sides. The barriers we have are not sufficient. (This is an
indication that the driver's design needs to be re-thought.) As it is,
the waiter's thread_wakeup_needed = 0 can overwrite the waker's
thread_wakeup_needed = 1 while the waiter's read of bh->state then
fails to see the waker's write. (This analysis is similar to but
different from the one in the patch description.)
To fix this problem, both the smp_rmb() in sleep_thread() and the
smp_wmb() in wakeup_thread() should be changed to smp_mb().
Felipe, was this patch meant to solve the problem you encountered in
your "Memory barrier needed with wake_up_process()?" email thread last
fall?
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html