Re: Hitting "unused qh not empty" BUG in qh_destroy

Alan Stern Mon, 15 Sep 2014 08:59:38 -0700

On Sat, 13 Sep 2014, Joe Lawrence wrote:

> Hi Alan,
> 
> I've collected 16 crashes since kicking off automated tests a little
> over 24 hrs ago.
> 
> Each crash hit the BUG in qh_destroy() and the only unique debugging
> printk from ehci_stop() was: "ehci_stop: ehci->num_async = 0".


What about error messages from the check_async_ring routine?

> I can include (or upload) a full (or filtered) vmcore-dmesg.txt if that
> would be more helpful.

Only if it includes one of those messages.

> The debug code I added as you suggested is provided below...
> 
> Thanks,
> 
> -- Joe
> 
> 
> diff -Nupr before/drivers/usb/host/ehci.h after/drivers/usb/host/ehci.h
> --- before/drivers/usb/host/ehci.h    2014-07-16 14:25:31.000000000 -0400
> +++ after/drivers/usb/host/ehci.h     2014-09-12 15:47:18.943478397 -0400
> @@ -231,6 +231,8 @@ struct ehci_hcd {                 /* one per controlle
>  
>       /* platform-specific data -- must come last */
>       unsigned long           priv[0] __aligned(sizeof(s64));
> +
> +     int num_async;
>  };

Not a good idea to put your new field _after_ a field that is
documented as coming last, but in this case it probably won't hurt.

>  /* convert between an HCD pointer and the corresponding EHCI_HCD */
> diff -Nupr before/drivers/usb/host/ehci-hcd.c 
> after/drivers/usb/host/ehci-hcd.c
> --- before/drivers/usb/host/ehci-hcd.c        2014-07-16 14:25:31.000000000 
> -0400
> +++ after/drivers/usb/host/ehci-hcd.c 2014-09-12 15:53:16.863163627 -0400
> @@ -440,6 +440,8 @@ static void ehci_stop (struct usb_hcd *h
>       if (ehci->amd_pll_fix == 1)
>               usb_amd_dev_put();
>  
> +     pr_err("%s: ehci->num_async = %d\n", __func__, ehci->num_async);
> +
>       dbg_status (ehci, "ehci_stop completed",
>                   ehci_readl(ehci, &ehci->regs->status));
>  }

Always 0, as it should be.

> diff -Nupr before/drivers/usb/host/ehci-q.c after/drivers/usb/host/ehci-q.c
> --- before/drivers/usb/host/ehci-q.c  2014-07-16 14:25:31.000000000 -0400
> +++ after/drivers/usb/host/ehci-q.c   2014-09-12 15:52:08.023292291 -0400
> @@ -959,6 +959,24 @@ static void disable_async(struct ehci_hc
>       ehci_poll_ASS(ehci);
>  }
>  
> +static void check_async_ring(struct ehci_hcd *ehci, int add)
> +{
> +     struct ehci_qh *qh;
> +     int n;
> +
> +     qh = ehci->async->qh_next.qh;
> +     n = ehci->num_async += add;
> +     while (qh && n > 0) {
> +             qh = qh->qh_next.qh;
> +             --n;
> +     }
> +     if (qh || n != 0) {
> +             ehci_err(ehci, "EHCI async list corrupted: num %d n %d qh %p\n",
> +                             ehci->num_async, n, qh);
> +             BUG();
> +     }
> +}

In the last call to this function, ehci->num_async must get set to 0 
(because that is the final value as printed out in ehci_stop).  This 
means n = 0, which means ehci->async->qh_next.qh must be NULL.  And 
this routine is the only place where ehci->num_async gets changed.

But then ehci->async->qh_next.qh isn't NULL when the final qh_destroy 
call is made.  This suggests that something changes it in the meantime.

You can check for this.  Sprinkle ehci_info messages throughout 
ehci_stop, printing the value of ehci->async->qh_next.qh.  It should be 
NULL the entire time.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Hitting "unused qh not empty" BUG in qh_destroy

Reply via email to