On Mon, 10 Apr 2017, Paul E. McKenney wrote:

> On Mon, Apr 10, 2017 at 12:20:53PM -0400, Alan Stern wrote:
> > On Mon, 10 Apr 2017, Paul E. McKenney wrote:
> > 
> > > > But I would like to get this matter settled first.  Is the explicit 
> > > > barrier truly necessary?
> > > 
> > > If you are using wait_event()/wake_up() or friends, the explicit
> > > barrier -is- necessary.  To see this, look at v4.10's wait_event():
> > > 
> > >   #define wait_event(wq, condition)                               \
> > >   do {                                                            \
> > >           might_sleep();                                          \
> > >           if (condition)                                          \
> > >                   break;                                          \
> > >           __wait_event(wq, condition);                            \
> > >   } while (0)
> > > 
> > > As you can see, if the condition is set just before the wait_event()
> > > macro checks it, there is no ordering whatsoever.
> > 
> > This is true, but it is not relevant to the question I was asking.
> 
> Apologies!  What I get for answering email too early on Monday, I guess...
> 
> > >  And if wake_up()
> > > finds nothing to wake up, there is no relevant ordering on that side,
> > > either.
> > > 
> > > So you had better supply your own ordering, period, end of story.
> > 
> > The question is: Exactly what ordering do I need to supply?  The 
> > ordering among my own variables is okay; I know how to deal with that.  
> > But what about the ordering between my variables and current->state?
> 
> The ordering with current->state is sadly not relevant because it is
> only touched if wake_up() actually wakes the process up.

Well, it is _written_ only if wake_up() actually wakes the process up.  
But it is _read_ in every case.

> > For example, __wait_event() calls prepare_to_wait(), which calls
> > set_current_state(), which calls smp_store_mb(), thereby inserting a
> > full memory barrier between setting current->state and checking the
> > condition.  But I didn't see any comparable barrier inserted by
> > wake_up(), between setting the condition and checking task->state.
> > 
> > However, now that I look more closely, I do see that wakeup_process() 
> > calls try_to_wake_up(), which begins with:
> > 
> >     /*
> >      * If we are going to wake up a thread waiting for CONDITION we
> >      * need to ensure that CONDITION=1 done by the caller can not be
> >      * reordered with p->state check below. This pairs with mb() in
> >      * set_current_state() the waiting thread does.
> >      */
> >     smp_mb__before_spinlock();
> >     raw_spin_lock_irqsave(&p->pi_lock, flags);
> >     if (!(p->state & state))
> > 
> > So it does insert a full barrier after all, and there is nothing to 
> > worry about.
> 
> Nice!

It looks like the other wakeup pathways end up funnelling through 
try_to_wake_up(), so this is true in general.

> Hmmm...
> 
> Another valid (and I believe more common) idiom is this:
> 
>       spin_lock(&mylock);
>       changes_that_must_be_visible_to_woken_thread();
>       WRITE_ONCE(need_wake_up, true);
>       spin_unlock(&mylock);
> 
>       ---
> 
>       wait_event(wq, READ_ONCE(need_wake_up));
>       spin_lock(&mylock);
>       access_variables_used_by_waking_thread();
>       spin_unlock(&mylock);
> 
> In this case, the locks do all the required ordering.
> 
> > This also means that the analysis provided by Thinh Nguyen in the 
> > original patch description is wrong.
> 
> And that the bug is elsewhere?

Presumably.  On the other hand, Thinh Nguyen claimed to have narrowed
the problem down to this particular mechanism.  The driver in question
in drivers/usb/gadget/function/f_mass_storage.c.  The waker routines
are bulk_out_complete()/wakeup_thread(), which do:

        // bulk_out_complete()
        bh->state = BH_STATE_FULL;

        // wakeup_thread()
        smp_wmb();      /* ensure the write of bh->state is complete */
        /* Tell the main thread that something has happened */
        common->thread_wakeup_needed = 1;
        if (common->thread_task)
                wake_up_process(common->thread_task);

and the waiters are get_next_command()/sleep_thread(), which do:

// get_next_command()
while (bh->state == BH_STATE_EMPTY) {

        // sleep_thread()
        for (;;) {
                if (can_freeze)
                        try_to_freeze();
                set_current_state(TASK_INTERRUPTIBLE);
                if (signal_pending(current)) {
                        rc = -EINTR;
                        break;
                }
                if (common->thread_wakeup_needed)
                        break;
                schedule();
        }
        __set_current_state(TASK_RUNNING);
        common->thread_wakeup_needed = 0;
        smp_rmb();      /* ensure the latest bh->state is visible */

}

and he said that the problem was caused by the waiter seeing 
thread_wakeup_needed == 0, so the wakeup was getting lost.

Hmmm, I suppose it's possible that the waker's thread_wakeup_needed = 1
could race with the waiter's thread_wakeup_needed = 0.  If there are
two waits in quick succession, the second could get lost.  The pattern
would be:

                                        bh->state = BH_STATE_FULL;
                                        smp_wmb();
        thread_wakeup_needed = 0;       thread_wakeup_needed = 1;
        smp_rmb();
        if (bh->state != BH_STATE_FULL)
                sleep again...

This is the so-called R pattern, and it also needs full memory barriers
on both sides.  The barriers we have are not sufficient.  (This is an
indication that the driver's design needs to be re-thought.)  As it is,
the waiter's thread_wakeup_needed = 0 can overwrite the waker's
thread_wakeup_needed = 1 while the waiter's read of bh->state then
fails to see the waker's write.  (This analysis is similar to but 
different from the one in the patch description.)

To fix this problem, both the smp_rmb() in sleep_thread() and the
smp_wmb() in wakeup_thread() should be changed to smp_mb().

Felipe, was this patch meant to solve the problem you encountered in
your "Memory barrier needed with wake_up_process()?" email thread last
fall?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to