David Brownell wrote:
On Wednesday 02 February 2005 5:35 am, [EMAIL PROTECTED] wrote:
Hi,
It seems that under some conditions when a usb device is physically
disconnected and there are some pending urbs ehci_endpoint_disable
can loop forever in state QH_STATE_UNLINK.
Hmm, that might explain some rare and intermittent problem reports.
What makes you believe that's what's happening?
A day of work instrumenting with printk's and that this patch "fixes" the problem:
--- kernel/linux-2.6.10/drivers/usb/host/ehci-hcd.c 2004-12-24 22:35:01.000000000 +0100
+++ ehci-hcd.c 2005-02-02 21:33:58.000000000 +0100
@@ -1020,6 +1020,7 @@
int epnum;
unsigned long flags;
struct ehci_qh *qh, *tmp;
+ int rescan_counter;
/* ASSERT: any requests/urbs are being unlinked */
/* ASSERT: nobody can be submitting urbs for this any more */
@@ -1028,7 +1029,15 @@
if (epnum != 0 && (ep & USB_DIR_IN))
epnum |= 0x10;+ rescan_counter = 0;
+
rescan:
+ rescan_counter ++;
+ if (rescan_counter > 1000) {
+ printk("ehci_endpoint_disable - timed out\n");
+ return;
+ }
+
spin_lock_irqsave (&ehci->lock, flags);
qh = (struct ehci_qh *) dev->ep [epnum];
if (!qh)This allows my driver to find the device when it has rebooted instead of hanging forever. I guess we leak some memory by this simple solution. BTW this is *very* easy for me to reproduce with my driver/device combination.
I can't really see whether this can be a hardware problem or it is purely in software as I don't
understand the mechanisms involved.
Basic mechanisms:
- If the HC is live:
* For bulk or control endpoints, set up to use the "Interrupt on Async Advance" (IAA) Doorbell IRQ. this is what's near the end of ehci-q.c ... basically, take endpoint's QH off the async ring, arrange for someone to ring the doorbell.
The two endpoints in use are bulk in and out.
Can anyone suggest a debugging technique? This happens with a clean 2.6.10 kernel.
Have you tried pr_debug() calls in the ehci_endpoint_disable() logic, and enabling the existing debug output?
It'd be good to use 2.6.11-rc2 instead of 2.6.10, since some of that code has changed.
I will look tomorrow.
This is triggered by my user space driver calling usb_release_interface when it detects a disconnect by a read or write
to the device failing.
So it could be a case of khubd and your driver competing on doing the unlink processing...
I thought it could be competition... but this should then be solved by an (arbitrary) delay in the driver, but a delay does not seem to help.
/Brian
------------------------------------------------------- This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting Tool for open source databases. Create drag-&-drop reports. Save time by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. Download a FREE copy at http://www.intelliview.com/go/osdn_nl _______________________________________________ [email protected] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel
