It's something wierd that started a while back, and so far
it doesn't make sense to me.  But see below, and please
try the patch I've attached.  If it fails, please send
the full CONFIG_USB_DEBUG output showing the error (as
below) and contents of /sys/class/usb_host/usb4/registers
after the fault.

On Monday 03 January 2005 7:06 am, Pedro Venda wrote:

> ehci_hcd: block sizes: qh 128 qtd 96 itd 192 sitd 96
> ACPI: PCI Interrupt Link [LNKH] enabled at IRQ 10
> ACPI: PCI interrupt 0000:00:1d.7[D] -> GSI 10 (level, low) -> IRQ 10
> ehci_hcd 0000:00:1d.7: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB 2.0 EHCI 
> Controller
> ehci_hcd 0000:00:1d.7: reset hcs_params 0x103206 dbg=1 cc=3 pcc=2 ordered !ppc
> ports=6
> ehci_hcd 0000:00:1d.7: reset hcc_params 6871 thresh 7 uframes 1024 64 bit addr
> ehci_hcd 0000:00:1d.7: capability 0001 at 68
> PCI: Setting latency timer of device 0000:00:1d.7 to 64
> ehci_hcd 0000:00:1d.7: irq 10, pci mem 0xd0000000
> ehci_hcd 0000:00:1d.7: new USB bus registered, assigned bus number 4
> ehci_hcd 0000:00:1d.7: reset command 080002 (park)=0 ithresh=8 period=1024 
> Reset
> HALT
> PCI: cache line size of 32 is not supported by device 0000:00:1d.7
> ehci_hcd 0000:00:1d.7: init command 010001 (park)=0 ithresh=1 period=1024 RUN
> ehci_hcd 0000:00:1d.7: USB 2.0 initialized, EHCI 1.00, driver 26 Oct 2004
> ehci_hcd 0000:00:1d.7: supports USB remote wakeup

All that looks just fine.

> usb usb4: new device strings: Mfr=3, Product=2, SerialNumber=1
> usb usb4: default language 0x0409
> usb usb4: Product: Intel Corp. 82801DB/DBM (ICH4/ICH4-M) USB 2.0 EHCI 
> Controller
> usb usb4: Manufacturer: Linux 2.6.10 ehci_hcd
> usb usb4: SerialNumber: 0000:00:1d.7
> usb usb4: hotplug
> uhci_hcd 0000:00:1d.1: port 1 portsc 008a,00
> usb usb4: adding 4-0:1.0 (config #1, interface 0)
> usb 4-0:1.0: hotplug
> hub 4-0:1.0: usb_probe_interface
> hub 4-0:1.0: usb_probe_interface - got id
> hub 4-0:1.0: USB hub found
> hub 4-0:1.0: 6 ports detected
> hub 4-0:1.0: standalone hub
> hub 4-0:1.0: ganged power switching

It shouldn't matter, but this is _slightly_ suspicious.
The EHCI spec seems to say that either there's per-port
power switching, or none at all.  But it also says that
trying it's safe to diddle the per-port switching bits
even if they're not active.


> hub 4-0:1.0: individual port over-current protection
> hub 4-0:1.0: Single TT
> hub 4-0:1.0: TT requires at most 8 FS bit times
> hub 4-0:1.0: power on to power good time: 20ms
> hub 4-0:1.0: local power source is good
> hub 4-0:1.0: enabling power on all ports
> hub 4-0:1.0: state 5 ports 6 chg ffff evt ffff
> ehci_hcd 0000:00:1d.7: GetStatus port 1 status 001030 POWER sig=se0  OCC OC
> hub 4-0:1.0: over-current change on port 1

Seems like a lot of people who get this error get it along with
an "overcurrent" error on port 1, right after the chip has been
initialized.


> hub 4-0:1.0: enabling power on all ports
> ehci_hcd 0000:00:1d.7: fatal error
> ehci_hcd 0000:00:1d.7: reset command 010003 (park)=0 ithresh=1 period=1024 
> Reset RUN

Now THAT looks very wrong.  The EHCI spec says that when
it reports the IRQ indicated by that "fatal error" message,
then the controller halts -- so it should say HALT not RUN
there.

One person reported that his controller (ALI) actually
ran just fine if he didn't kick in the fatal error handling
there.  That's why this patch looks at that bit before
deciding what to do ...


> ehci_hcd 0000:00:1d.7: HC died; cleaning up
> hub 4-0:1.0: port 1, status 0108, change 0008, 12 Mb/s
> 
> It seems to me that the ehci driver gets started ok, but while hotplugging
> devices, it crashes and bails out.
> 
> Is this hardware related? Is this a known issue?

It's been seen for a while, but it's quite puzzling.

I suspect the problem comes as a side effect of something
else.  It could be board-related, but it doesn't seem
to be chip-related.   I've seen reports of this with
ALI, NEC, and Intel chips, but it doesn't happen for
me with ALI or NEC chips; and plenty of folk are using
those chips without seeing this problem.

- Dave

Experimental patch, to try

 - catching some "fatal error" IRQs that seem to be bogus
   (HC didn't actually halt, contrary to spec),
   
 - reporting some controllers as "no power switching", since
   those "fatal errors" seem coupled to overcurrent reports
   on at least the first root hub port.


--- 1.94/drivers/usb/host/ehci-hcd.c	Tue Nov 23 00:39:00 2004
+++ edited/drivers/usb/host/ehci-hcd.c	Mon Jan  3 12:39:31 2005
@@ -800,7 +800,7 @@
 		goto dead;
 	}
 
-	status &= INTR_MASK;
+	status &= INTR_MASK | STS_HALT;
 	if (!status) {			/* irq sharing? */
 		spin_unlock(&ehci->lock);
 		return IRQ_NONE;
@@ -864,13 +864,19 @@
 
 	/* PCI errors [4.15.2.4] */
 	if (unlikely ((status & STS_FATAL) != 0)) {
-		ehci_err (ehci, "fatal error\n");
+		dbg_cmd (ehci, "fatal", readl (&ehci->regs->command));
+		dbg_status (ehci, "fatal", readl (&ehci->regs->status));
+		if (!(status & STS_HALT))
+			ehci_err (ehci, "bogus 'fatal' error\n");
+		else {
+			ehci_err (ehci, "fatal error\n");
 dead:
-		ehci_reset (ehci);
-		/* generic layer kills/unlinks all urbs, then
-		 * uses ehci_stop to clean up the rest
-		 */
-		bh = 1;
+			ehci_reset (ehci);
+			/* generic layer kills/unlinks all urbs, then
+			 * uses ehci_stop to clean up the rest
+			 */
+			bh = 1;
+		}
 	}
 
 	if (bh)
--- 1.31/drivers/usb/host/ehci-hub.c	Mon Dec 20 03:48:01 2004
+++ edited/drivers/usb/host/ehci-hub.c	Mon Jan  3 12:10:06 2005
@@ -281,6 +281,8 @@
 	temp = 0x0008;			/* per-port overcurrent reporting */
 	if (HCS_PPC (ehci->hcs_params))
 		temp |= 0x0001;		/* per-port power control */
+	else
+		temp |= 0x0002;		/* no power switching */
 #if 0
 // re-enable when we support USB_PORT_FEAT_INDICATOR below.
 	if (HCS_INDICATOR (ehci->hcs_params))

Reply via email to