David Brownell wrote:

On Wednesday 02 February 2005 5:35 am, [EMAIL PROTECTED] wrote:


Hi,
It seems that under some conditions when a usb device is physically
disconnected and there are some pending urbs ehci_endpoint_disable
can loop forever in state QH_STATE_UNLINK.



Hmm, that might explain some rare and intermittent problem reports.

What makes you believe that's what's happening?




A day of work instrumenting with printk's and that this patch "fixes" the
problem:

--- kernel/linux-2.6.10/drivers/usb/host/ehci-hcd.c 2004-12-24 22:35:01.000000000 +0100
+++ ehci-hcd.c 2005-02-02 21:33:58.000000000 +0100
@@ -1020,6 +1020,7 @@
int epnum;
unsigned long flags;
struct ehci_qh *qh, *tmp;
+ int rescan_counter;


       /* ASSERT:  any requests/urbs are being unlinked */
       /* ASSERT:  nobody can be submitting urbs for this any more */
@@ -1028,7 +1029,15 @@
       if (epnum != 0 && (ep & USB_DIR_IN))
               epnum |= 0x10;

+       rescan_counter = 0;
+
rescan:
+       rescan_counter ++;
+       if (rescan_counter > 1000) {
+               printk("ehci_endpoint_disable - timed out\n");
+               return;
+       }
+
       spin_lock_irqsave (&ehci->lock, flags);
       qh = (struct ehci_qh *) dev->ep [epnum];
       if (!qh)

This allows my driver to find the device when it has rebooted instead
of hanging forever. I guess we leak some memory by this simple solution.
BTW this is *very* easy for me to reproduce with my driver/device
combination.

I can't really see whether this can be a hardware problem or it is purely in software as I don't
understand the mechanisms involved.



Basic mechanisms:

- If the HC is live:

    * For bulk or control endpoints, set up to use the
      "Interrupt on Async Advance" (IAA) Doorbell IRQ.
      this is what's near the end of ehci-q.c ... basically,
      take endpoint's QH off the async ring, arrange for someone
      to ring the doorbell.



The two endpoints in use are bulk in and out.

Can anyone suggest a debugging technique? This happens with a clean 2.6.10 kernel.



Have you tried pr_debug() calls in the ehci_endpoint_disable() logic, and enabling the existing debug output?

It'd be good to use 2.6.11-rc2 instead of 2.6.10, since some
of that code has changed.




I will look tomorrow.

This is triggered by my user space driver calling usb_release_interface when it detects a disconnect by a read or write
to the device failing.



So it could be a case of khubd and your driver competing on doing the unlink processing...



I thought it could be competition... but this should then be solved by
an (arbitrary) delay in the driver, but a delay does not seem to help.

/Brian



-------------------------------------------------------
This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
Tool for open source databases. Create drag-&-drop reports. Save time
by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
Download a FREE copy at http://www.intelliview.com/go/osdn_nl
_______________________________________________
[email protected]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to