> Hi Peter,
> 
> On Fri, Sep 21, 2018 at 09:48:43AM +0800, Peter Chen wrote:
> > Type-C-to-A cable, and the USB3 HCD has already been NULL at that time.
> > The oops log like below:
> >
> > [681.782288] xhci-hcd xhci-hcd.1.auto: remove, state 1 [681.787490]
> > usb usb4: USB disconnect, device number 1 [681.792808] usb 4-1: USB
> > disconnect, device number 2 [681.818089] xhci-hcd xhci-hcd.1.auto: USB
> > bus 4 deregistered [681.823803] Unable to handle kernel NULL pointer
> > dereference at virtual address 000000a0 [681.823806] Mem abort info:
> > [681.823809]   Exception class = DABT (current EL), IL = 32 bits
> > [681.823811]   SET = 0, FnV = 0
> > [681.823813]   EA = 0, S1PTW = 0
> > [681.823814] Data abort info:
> > [681.823816]   ISV = 0, ISS = 0x00000004
> > [681.823818]   CM = 0, WnR = 0
> > [681.823822] user pgtable: 4k pages, 48-bit VAs, pgd =
> > ffff8000ae3fd000 [681.823824] [00000000000000a0] *pgd=0000000000000000
> > [681.823829] Internal error: Oops: 96000004 [#1] PREEMPT SMP
> > [681.823832] Modules linked in: 8021q garp stp mrp crc32_ce qca6174(O)
> crct10dif_ce galcore(O)
> > [681.823849] CPU: 0 PID: 94 Comm: kworker/0:1 Tainted: G           O    
> > 4.14.62-
> imx_4.14.y+gcd63def #1
> > [681.823851] Hardware name: Freescale i.MX8MQ EVK (DT) [681.823862]
> > Workqueue: events_freezable __dwc3_set_mode [681.823865] task:
> > ffff8000b8a18000 task.stack: ffff00000a010000 [681.823872] PC is at
> > xhci_irq+0x5fc/0x14b8 [681.823875] LR is at xhci_irq+0x3c/0x14b8
> 
> <snip>
> 
> > diff --git a/drivers/usb/host/xhci-ring.c
> > b/drivers/usb/host/xhci-ring.c index f0a99aa0ac58..2dc5176b79d0 100644
> > --- a/drivers/usb/host/xhci-ring.c
> > +++ b/drivers/usb/host/xhci-ring.c
> > @@ -2680,7 +2680,8 @@ irqreturn_t xhci_irq(struct usb_hcd *hcd)
> >     }
> >
> >     if (xhci->xhc_state & XHCI_STATE_DYING ||
> > -       xhci->xhc_state & XHCI_STATE_HALTED) {
> > +       xhci->xhc_state & XHCI_STATE_HALTED ||
> > +       xhci->xhc_state & XHCI_STATE_REMOVING) {
> >             xhci_dbg(xhci, "xHCI dying, ignoring interrupt. "
> >                             "Shouldn't IRQs be disabled?\n");
> >             /* Clear the event handler busy flag (RW1C);
> 
> We also noticed the same crash as you found, and tried to fix it in a similar 
> way, but
> noticed that this still allows a memory leak to happen.
> 
> It seems from commit fe190ed0d602a ("xhci: Do not halt the host until both HCD
> have disconnected their devices.") this was added to xhci_stop(), and is the 
> reason
> we encounter the NULL pointer in
> xhci_irq() when it tries to access xhci->shared_hcd.
> 
>       +       /* Only halt host and free memory after both hcds are removed */
>               if (!usb_hcd_is_primary_hcd(hcd)) {
>       +               /* usb core will free this hcd shortly, unset pointer */
>       +               xhci->shared_hcd = NULL;
>                       mutex_unlock(&xhci->mutex);
>                       return;
>               }
> 
> While your fix will simply abort the xhci_irq() function if it encounters
> XHCI_STATE_REMOVING, during xhci_plat_remove():
> 
>       static int xhci_plat_remove(struct platform_device *dev)
>       {
>               struct usb_hcd  *hcd = platform_get_drvdata(dev);
>               struct xhci_hcd *xhci = hcd_to_xhci(hcd);
>               struct clk *clk = xhci->clk;
>               struct clk *reg_clk = xhci->reg_clk;
> 
>               xhci->xhc_state |= XHCI_STATE_REMOVING;
> 
>               usb_remove_hcd(xhci->shared_hcd);
>               ^^^^^^^^^ calls xhci_stop() and sets shared_hcd=NULL
>               usb_phy_shutdown(hcd->usb_phy);
> 
>               usb_remove_hcd(hcd);
>               usb_put_hcd(xhci->shared_hcd);
>               ^^^^^^^^^^^ shared_hcd==NULL, so this is a no-op
> 
> Since usb_put_hcd() doesn't get called for shared_hcd, we end up with one
> additional kref count and hence a leak.
> 
> Wondering if we need to also remove the xhci->shared_hcd = NULL from 
> xhci_stop(),
> in addition to your patches. Thoughts?
> 
 
Hi Jack,

With your thought of removing the xhci->shared_hcd = NULL at xhci_stop, the 
oops in
this commit mentioned has disappeared. It seems removing USB3 HCD structure
not affect it handles USB3 interrupt during the disconnection. Please have a 
test if
only this change can fix your problem. You could submit a patch for this change
as a fix, it fixed the memory leak for USB3 HCD structure too.

Peter

Reply via email to