On Fri, Aug 23, 2013 at 10:43:19AM -0400, Alan Stern wrote:
> On Wed, 21 Aug 2013, Sarah Sharp wrote:
> 
> > Background
> > ----------
> > 
> > The USB 2.0 specification, section 7.1.7.7, says that upon device remote
> > wakeup signaling, the first active hub (which is often the roothub) must
> > rebroadcast the resume signaling for at least 20 ms (TDRSMDN).  After
> > that's done, the hub's suspend status change bit will be set, and system
> > software must not access the device for at least 10 ms (TRSMRCY).
> > 
> > It turns out that TRSMRCY is a *minimum*, not a *maximum*, according to
> > Table 7-14.  That means the port can actually take longer than TRSMRCY
> > to resume.  Any attempt to communicate with the device, or reset the
> > device, will result in a USB device disconnect.
> 
> By the way, I just noticed your Google+ posting about this.  I think 
> you (and perhaps the engineers you spoke with) may have misunderstood 
> what Table 7-14 means when it lists 10 ms as the _minimum_ value for 
> TRSMRCY.
> 
> This delay value is a requirement on the OS.  The host system must not
> access the device until at least 10 ms after the resume is complete.  
> The system can wait longer than that if it wants -- that's why 10 ms is
> a minimum.  It just has to avoid accessing the device sooner.
> 
> A _minimum_ value on the host side translates into a _maximum_ value on 
> the device side.  The device can safely assume that it can spend up to 
> 10 ms getting back into shape after a resume, but no more.  After 10 
> ms, the host may try to communicate with it.

After re-reading the spec, I agree with your analysis.  However, the
fact that chipset designers misinterpreted the spec means there may be
hardware out there that needs a longer timeout.  The spec should have
been normative on both the software and the hardware, saying something
like:

"The USB System Software must provide a 10 ms resume recovery time
(TRSMRCY) during which it will not attempt to access any device
connected to the affected (just-activated) bus segment.  The host
controller and device must be ready for communication after the resume
recovery time (TRSMRCY) expires."

I have heard reports of USB devices disconnecting from the bus and
reconnecting after remote wakeup.  I've personally experienced this with
one of my PL2303 USB serial adapters, although it has since died, so I
can't retest.

Another company (whose email I ironically lost due to a failed transfer
to a USB 3.0 backup drive) had bus traces showing the root cause of a
disconnect on resume from remote wakeup.  Occasionally, the host
controller was sending the SoFs too soon on resume, and the device would
interpret it as a low-speed chirp.  The device would disconnect, and
transform from a high speed device to a low speed device.  I don't
think increasing the 10 ms time out will help at all in this case, but
you did ask what USB device disconnect scenarios I've seen.

If users do see device disconnects on remote wakeup resume, we should
see if increasing the timeout helps.

> > Then, when the USB core calls into get port status, it transitions the
> > port from the Resume state to the RExit state by changing the port link
> > state to U0.  The xHCI driver will get a port status change event when
> > that transition is complete, but that port status change event is
> > currently ignored.
> 
> The excess delay you observe with xHCI is the time spent in the RExit
> substate?  That probably should not be counted as part of the TRSMRCY
> period.  It's hard to say for certain, because TRSMRCY is described
> only in the USB-2 spec and not in the xHCI spec, and vice versa for
> RExit.  Still, it's reasonable to assume that the TRSMRCY period should
> begin when the port changes back to U0, not when it leaves the RESUME
> state and enters RExit.
> 
> So in the end this appears to be a simple bug in xhci-hcd.  The
> Get-Port-Status request that terminates the resume signalling should
> wait until the port goes back into U0 (which agrees with what you have
> already decided, of course).  ehci-hcd does something similar:
> 
>                       /* stop resume signaling */
>                       temp &= ~(PORT_RWC_BITS | PORT_SUSPEND | PORT_RESUME);
>                       ehci_writel(ehci, temp, status_reg);
>                       clear_bit(wIndex, &ehci->resuming_ports);
>                       retval = ehci_handshake(ehci, status_reg,
>                                       PORT_RESUME, 0, 2000 /* 2msec */);
> 
> The ehci_handshake call busy-waits until the controller turns off the
> PORT_RESUME bit, which happens when the port has switched to a
> high-speed idle.  It's supposed to take no more than 2 ms but hopefully
> is a lot faster.  (Hmmm, maybe the private lock should be dropped
> during this handshake...)

Ah, so there is an analogous issue in EHCI.  Basically, the EHCI driver
waiting for the PORT_RESUME bit to be clear is equivalent to the xHCI
driver waiting for the port to enter U0.  I agree that this seems like
an xHCI driver issue, and I'll fix it in the driver.

Sarah Sharp
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to