On Thu, 11 Sep 2014, Joe Lawrence wrote:
> Hi Alan,
>
> I've got another USB bug to report that manifests during automated
> device removal testing on RHEL7. This one hits the BUG() inside
> qh_destroy:
How reliably can you trigger this bug?
> 67 static void qh_destroy(struct ehci_hcd *ehci, struct ehci_qh *qh)
> 68 {
> 69 /* clean qtds first, and know this is not linked */
> 70 if (!list_empty (&qh->qtd_list) || qh->qh_next.ptr) {
> 71 ehci_dbg (ehci, "unused qh not empty!\n");
> 72 BUG ();
> 73 }
> and finally a dump of the ehci_qh in question:
>
> crash> struct ehci_qh ffff88084b84dc80
> struct ehci_qh {
> hw = 0xffff880078d1a000,
It would be good to see the contents of the ehci_qh_hw structure. That
would tell us what device and endpoint this QH was for.
> qh_dma = 0x78d1a000,
> qh_next = {
> qh = 0xffff88084efe6730,
> itd = 0xffff88084efe6730,
> sitd = 0xffff88084efe6730,
> fstn = 0xffff88084efe6730,
> hw_next = 0xffff88084efe6730,
> ptr = 0xffff88084efe6730 << !NULL
> },
> qtd_list = { << list_empty
> next = 0xffff88084b84dc98,
> prev = 0xffff88084b84dc98
> },
> intr_node = {
> next = 0x0,
> prev = 0x0
> },
> dummy = 0xffff880078d22000,
> unlink_node = {
> next = 0xffff88084b84dcc0,
> prev = 0xffff88084b84dcc0
> },
> unlink_cycle = 0x0,
> qh_state = 0x1, << QH_STATE_LINKED
...
> }
>
> The qtd_list is empty, contains only one entry, itself.
>
> crash> struct -o ehci_qh | grep td_list
> [0x18] struct list_head qtd_list;
> crash> p/x 0xffff88084b84dc80 + 0x18
> $1 = 0xffff88084b84dc98
>
> but qh->qh_next.ptr is !NULL, so we hit the BUG. However, it seems that
> the memory at qh->qh_next.ptr has been freed:
> I'm not too familiar with the USB code stack, so any suggestions on
> instrumentation that I can add to aid in debugging would be helpful.
> Maybe some tracing in qh_link_async / single_unlink_async /
> end_unlink_async /qh_link_periodic can reveal the sequence that is
> leaving this dangling qh_next.ptr?
The place to look is ehci_endpoint_disable. Did that routine get
called for this QH? Did it hit the default case of the big switch
statement (with its ehci_err statement)?
> Note: This does bear some resemblance to a bug that Stratus hit a few
> years ago [1] [2], however enough of the code has changed that I'm not
> sure the fix for that one would apply to a modern kernel.
What version of the driver are you currently running?
Alan Stern
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html