Hi, Vincent:
We tried my patch doesn't fix the issue totally, we still see the
kernel panic.
currently solution in our test is: Rvert "usb: gadget: ffs: Fix BUG
when userland exits with submitted AIO transfers"
and only add the below patch to use the in_interrupt() to avoid the
wait_event_lock_irq() is called in interrupt context.
Subject: [PATCH] fix the wait_event_lock_irq run in interrupt context
the patch is to fix the below bug:
BUG: scheduling while atomic: SettingsProvide/3433/0x00000104
Preemption disabled at:
[<ffffffff81e00053>] __do_softirq+0x53/0x31b
CPU: 1 PID: 3433 Comm: SettingsProvide Tainted: P U
ilt-2e5dc0ac-g3f662bf-dirty #8
Call Trace:
<IRQ>
dump_stack+0x70/0x9e
? __do_softirq+0x53/0x31b
__schedule_bug+0x7f/0xd0
__schedule+0x61d/0x860
? _raw_spin_unlock_irqrestore+0x28/0x50
? prepare_to_wait_event+0x87/0x170
schedule+0x36/0x90
dwc3_gadget_ep_dequeue+0x27a/0x2e0 [dwc3]
? finish_wait+0x90/0x90
usb_ep_dequeue+0x23/0x90
ffs_aio_cancel+0x4c/0x70
kiocb_cancel+0x40/0x50
free_ioctx_users+0x6e/0x100
percpu_ref_switch_to_atomic_rcu+0x159/0x160
rcu_process_callbacks+0x1a7/0x520
__do_softirq+0x11a/0x31b
irq_exit+0xb5/0xc0
smp_apic_timer_interrupt+0x7c/0x160
the BUG is introduced form the patch:
commit: cf3113d89
usb: dwc3: gadget: properly increment dequeue pointer on ep_dequeue
Signed-off-by: he, bo <[email protected]>
---
drivers/usb/dwc3/gadget.c | 112 +++++++++++++++++++++++-----------------------
1 file changed, 57 insertions(+), 55 deletions(-)
diff --git a/drivers/usb/dwc3/gadget.c b/drivers/usb/dwc3/gadget.c
index 7120678..386fd82 100644
--- a/drivers/usb/dwc3/gadget.c
+++ b/drivers/usb/dwc3/gadget.c
@@ -1413,65 +1413,67 @@ static int dwc3_gadget_ep_dequeue(struct usb_ep *ep,
/* wait until it is processed */
dwc3_stop_active_transfer(dwc, dep->number, true);
- /*
- * If request was already started, this means we had to
- * stop the transfer. With that we also need to ignore
- * all TRBs used by the request, however TRBs can only
- * be modified after completion of END_TRANSFER
- * command. So what we do here is that we wait for
- * END_TRANSFER completion and only after that, we jump
- * over TRBs by clearing HWO and incrementing dequeue
- * pointer.
- *
- * Note that we have 2 possible types of transfers here:
- *
- * i) Linear buffer request
- * ii) SG-list based request
- *
- * SG-list based requests will have r->num_pending_sgs
- * set to a valid number (> 0). Linear requests,
- * normally use a single TRB.
- *
- * For each of these two cases, if r->unaligned flag is
- * set, one extra TRB has been used to align transfer
- * size to wMaxPacketSize.
- *
- * All of these cases need to be taken into
- * consideration so we don't mess up our TRB ring
- * pointers.
- */
- wait_event_lock_irq(dep->wait_end_transfer,
- !(dep->flags &
DWC3_EP_END_TRANSFER_PENDING),
- dwc->lock);
-
- if (!r->trb)
- goto out0;
-
- if (r->num_pending_sgs) {
- struct dwc3_trb *trb;
- int i = 0;
-
- for (i = 0; i < r->num_pending_sgs; i++) {
- trb = r->trb + i;
- trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
- dwc3_ep_inc_deq(dep);
- }
+ if(likely(!in_interrupt())) {
+ /*
+ * If request was already started, this means
we had to
+ * stop the transfer. With that we also need to
ignore
+ * all TRBs used by the request, however TRBs
can only
+ * be modified after completion of END_TRANSFER
+ * command. So what we do here is that we wait
for
+ * END_TRANSFER completion and only after that,
we jump
+ * over TRBs by clearing HWO and incrementing
dequeue
+ * pointer.
+ *
+ * Note that we have 2 possible types of
transfers here:
+ *
+ * i) Linear buffer request
+ * ii) SG-list based request
+ *
+ * SG-list based requests will have
r->num_pending_sgs
+ * set to a valid number (> 0). Linear requests,
+ * normally use a single TRB.
+ *
+ * For each of these two cases, if r->unaligned
flag is
+ * set, one extra TRB has been used to align
transfer
+ * size to wMaxPacketSize.
+ *
+ * All of these cases need to be taken into
+ * consideration so we don't mess up our TRB
ring
+ * pointers.
+ */
+ wait_event_lock_irq(dep->wait_end_transfer,
+ !(dep->flags &
DWC3_EP_END_TRANSFER_PENDING),
+ dwc->lock);
+
+ if (!r->trb)
+ goto out0;
+
+ if (r->num_pending_sgs) {
+ struct dwc3_trb *trb;
+ int i = 0;
+
+ for (i = 0; i < r->num_pending_sgs;
i++) {
+ trb = r->trb + i;
+ trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
+ dwc3_ep_inc_deq(dep);
+ }
+
+ if (r->unaligned || r->zero) {
+ trb = r->trb +
r->num_pending_sgs + 1;
+ trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
+ dwc3_ep_inc_deq(dep);
+ }
+ } else {
+ struct dwc3_trb *trb = r->trb;
- if (r->unaligned || r->zero) {
- trb = r->trb + r->num_pending_sgs + 1;
trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
dwc3_ep_inc_deq(dep);
- }
- } else {
- struct dwc3_trb *trb = r->trb;
- trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
- dwc3_ep_inc_deq(dep);
-
- if (r->unaligned || r->zero) {
- trb = r->trb + 1;
- trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
- dwc3_ep_inc_deq(dep);
+ if (r->unaligned || r->zero) {
+ trb = r->trb + 1;
+ trb->ctrl &= ~DWC3_TRB_CTRL_HWO;
+ dwc3_ep_inc_deq(dep);
+ }
}
}
goto out1;
--
2.7.4
-----Original Message-----
From: Vincent Pelletier <[email protected]>
Sent: Tuesday, September 25, 2018 8:46 PM
To: He, Bo <[email protected]>
Cc: Felipe Balbi <[email protected]>; Alan Stern
<[email protected]>; Roger Quadros <[email protected]>; Lars-Peter Clausen
<[email protected]>; Sam Protsenko <[email protected]>;
[email protected]; Praneeth Bajjuri <[email protected]>; Bai, Jie A
<[email protected]>
Subject: Re: usb: gadget: ffs: Fix BUG when userland exits with submitted AIO
transfers
Hello,
On Thu, 2 Aug 2018 14:23:51 +0000, Vincent Pelletier <[email protected]>
wrote:
> On Thu, 2 Aug 2018 00:45:14 +0000, "He, Bo" <[email protected]> wrote:
> > Your patch fix the issue BUG: scheduling while atomic:
>
> Yes, although from my understanding of Felipe's answer, the actual bug
> is the "scheduling" part (sleeping in dwc3 UDC) rather than the
> "atomic" part.
>
> So my patch addresses, still if my understanding is correct, the wrong
> half of the problem, and even introduced the regression you identified.
> Hence my uncertainty...
I notice that neither He's patch, nor a dwc3 change to prevent it from
scheduling inside usb_ep_dequeue are in Linus' master. Please correct me if I
missed something.
Just in case my previous emails were not clear:
- I have no objection to He's patch on its own (and I do not know the
code nearly enough to provide a meaningful reviewed-by).
- I do not intend to work on making changes to dwc3 gadget to stop it
from scheduling in usb_ep_dequeue in the foreseeable future.
So if there is no ongoing work on dwc3 cancel behaviour (Felipe ?), please do
resume/carry on with reviewing and integrating He's patch.
It is only *if* dwc3 cancel stops scheduling that I believe my patch should be
reverted (here is the hash as of Linus' master):
commit d52e4d0c0c428bf2ba35074a7495cdb28e2efbae
Author: Vincent Pelletier <[email protected]>
Date: Wed Jun 13 11:05:06 2018 +0000
usb: gadget: ffs: Fix BUG when userland exits with submitted AIO transfers
This bug happens only when the UDC needs to sleep during usb_ep_dequeue,
as is the case for (at least) dwc3.
and, in my understanding, a consequence is that He's fix would not be needed
anymore - the bug my patch introduced disappearing in the revert.
Regards,
--
Vincent Pelletier