On Mon, 10 Apr 2017, Felipe Balbi wrote:
> Hi,
>
> Thinh Nguyen <[email protected]> writes:
> > f_mass_storage has a memorry barrier issue with the sleep and wake
> > functions that can cause a deadlock. This results in intermittent hangs
> > during MSC file transfer. The host will reset the device after receiving
> > no response to resume the transfer. This issue is seen when dwc3 is
> > processing 2 transfer-in-progress events at the same time, invoking
> > completion handlers for CSW and CBW. Also this issue occurs depending on
> > the system timing and latency.
> >
> > To increase the chance to hit this issue, you can force dwc3 driver to
> > wait and process those 2 events at once by adding a small delay (~100us)
> > in dwc3_check_event_buf() whenever the request is for CSW and read the
> > event count again. Avoid debugging with printk and ftrace as extra
> > delays and memory barrier will mask this issue.
> >
> > Scenario which can lead to failure:
> > -----------------------------------
> > 1) The main thread sleeps and waits for the next command.
> > 2) bulk_in_complete() wakes up main thread for CSW.
> > 3) bulk_out_complete() tries to wake up the running main thread for CBW.
> > 4) thread_wakeup_needed is not loaded with correct value in sleep_thread().
> > 5) Main thread goes to sleep.
> >
> > See more detail below:
> >
> > Note the 2 critical variables.
> > * common->thread_wakeup_needed
> > * bh->state
> >
> > Eval'd | CPU0 (main thread) | CPU1
> > --------+------------------------------------------+----------------------
> > | get_next_command() |
> > | ... |
> > while(1)| while (bh->state != BUF_STATE_FULL) { |
> > | for (;;) { |
> > | set_current_state(TASK_INTERRUPTIBLE);|
> > if(0) | if (thread_wakeup_needed) |
> > | schedule() |
> > | | bulk_in_complete() {
> > | | smp_wmb();
> > | | bh->state = BUF_STATE_EMPTY
> > | | smp_wmb();
> > | | thread_wakeup_needed = 1
> > | | wake_up_process()
> > | | }
> > | /* 2nd for loop iteration */ |
> > | for (;;) { |
> > | set_current_state(TASK_INTERRUPTIBLE);|
> > if(1) | if (thread_wakeup_needed) |
> > | break |
> > | } |
> > | | bulk_out_complete() {
> > | | smp_wmb();
> > | __set_current_state(TASK_RUNNING); |
> > | thread_wakeup_needed = 0; |
> > | smp_rmb(); |
> > | |
> > | /* 2nd while loop iteration */ |
> > while(1)| while (bh->state != BUF_STATE_FULL) { |
> > | | bh->state = BUF_STATE_FULL;
> > | | smp_wmb();
> > | -->| thread_wakeup_needed = 1
> > | | wake_up_process()
> > | | }
> > | for (;;) { |
> > | set_current_state(TASK_INTERRUPTIBLE);|
> > if(0) | if (thread_wakeup_needed) |<--
> > | schedule() |
> > | |
> > | <THREAD GOES TO SLEEP AND WAITS> |
> >
> > thread_wakeup_needed is not guaranteed to be loaded before checking it.
> > Note that this happens when wake_up_process() is called on a running
> > process.
> >
> > This patch fixes this issue by explicitly use smp_mb() after clearing of
> > thread_wakeup_needed in the sleep function. It serializes the order in
> > which thread_wakeup_needed is loaded and stored, thus prevent loading
> > the wrong value when checking it in the sleeper.
> >
> > However, a better solution in the future would be to use wait_queue
> > method that takes care of managing memory barrier between waker and
> > waiter.
> >
> > See DEFINE_WAIT_FUNC comment in kernel scheduling wait.c as this
> > solution is similar to its implementation.
> >
> > Signed-off-by: Thinh Nguyen <[email protected]>
> > ---
> > drivers/usb/gadget/function/f_mass_storage.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/usb/gadget/function/f_mass_storage.c
> > b/drivers/usb/gadget/function/f_mass_storage.c
> > index 4c8aacc232c0..f40433eb8259 100644
> > --- a/drivers/usb/gadget/function/f_mass_storage.c
> > +++ b/drivers/usb/gadget/function/f_mass_storage.c
> > @@ -627,7 +627,7 @@ static int sleep_thread(struct fsg_common *common, bool
> > can_freeze)
> > }
> > __set_current_state(TASK_RUNNING);
> > common->thread_wakeup_needed = 0;
> > - smp_rmb(); /* ensure the latest bh->state is visible */
> > + smp_mb(); /* ensure the latest bh->state is visible */
> > return rc;
> > }
> >
> > --
> > 2.11.0
>
> Alan, do you have any comments here? Do we really need a full barrier in
> this situation?
Peter Z has pointed out in the past that the locking and barrier
mechanisms in f_mass_storage are not well thought out. He also
suggested using a wait_queue rather than doing some home-made
sleep/wake calculations. I agree that this would simplify the code.
However, in this case I do not believe the bug described above will be
fixed by changing the smp_rmb() to smp_mb() or by going to a
wait_queue. Here's my reasoning; maybe I'm wrong...
The important part of this race can be written schematically like so:
CPU 0 CPU 1
A: thread_wakeup_needed = 0;
smp_rmb();
...
B: thread_wakeup_needed = 1;
C: wake_up_process();
D: set_current_state(TASK_INTERRUPTIBLE);
E: if (thread_wakeup_needed)
break;
schedule();
Now, set_current_state() uses smp_store_mb(), so D is equivalent to
D: current->state = TASK_INTERRUPTIBLE;
smp_mb();
Furthermore, wake_up_process(task) must read task->state in order to
see whether task needs to be woken up. HOWEVER, the kerneldoc for
wake_up_process() says:
* It may be assumed that this function implies a write memory barrier before
* changing the task state if and only if any tasks are woken up.
(Side point: This says _write_ memory barrier, not _full_ memory
barrier, which is what is needed here.)
So if CPU 0 was not yet in TASK_INTERRUPTIBLE when CPU 1 ran
wake_up_process(), then CPU 1 would not execute any barriers and we
would have:
CPU 0 CPU 1
D: Write task->state B: Write thread_wakeup_needed
smp_mb() C: Read task->state
E: Read thread_wakeup_needed
with no memory barrier on CPU 1. In memory model circles this is known
as the SB (Store Buffering) pattern, and without a full barrier on both
sides it is entirely possible for both reads to miss seeing the other
CPU's write. That's what causes CPU 0 to call schedule() with nobody
to wake it up again: CPU 0 sees that thread_wakeup_needed is still 0,
and CPU 1 sees that task->state is still TASK_RUNNING.
Please note that this analysis remains unchanged if the smp_rmb() after
A is changed to smp_mb(). The problem is not related to the
assignment at A; it is related to the assignment at B. We need to have
a memory barrier between B and C regardless of whether the task was
woken up.
In other words, I believe the correct fix is to add smp_mb() in
f_mass_storage's wakeup_thread() routine, just after
thread_wakeup_needed is set to 1.
Before today, I had believed that wake_up_process() implicitly
performed this memory barrier in all cases. But the kerneldoc says it
doesn't, and unless Peter tells us that the kerneldoc is wrong, I see
no alternative to adding the memory barrier explicitly.
Incidentally, going to a wait_queue implementation wouldn't help. With
wait_queues, the pattern for CPU 1 would be:
thread_wakeup_needed = 1;
wake_up(&wq);
But wake_up() is implemented as a call to __wake_up(), and the
kerneldoc for __wake_up() says:
* It may be assumed that this function implies a write memory barrier before
* changing the task state if and only if any tasks are woken up.
Just as in wake_up_process() -- and again, what we need is an
unconditional full barrier, not a conditional write barrier. See also
the kerneldoc for waitqueue_active() in include/linux/wait.h; it states
that an explicit smp_mb() is required on the waker.
It's hard to believe this sort of problem hasn't bitten other people in
the past. Is there a better solution than to add a memory barrier
every place where wake_up() (or one of its siblings) is called?
Alan Stern
P.S.: This kind of solution is okay for right away and something to put
in -stable. For the longer term, however, it has hardened my resolve
to completely rewrite the locking and barriers in f_mass_storage.
But I would like to get this matter settled first. Is the explicit
barrier truly necessary?
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html