On 11.04.2016 08:28, Felipe Balbi wrote:

Hi,

Matthew Giassa <matt...@giassa.net> writes:
*Migrating from linux-media mailing list.

Good day,

I maintain an SDK for USB2.0 and USB3.0 U3V machine vision cameras, and
several of our customers have reported severe issues since upgrading
from
kernel 3.19.0-51 (Ubuntu 14.04.3 LTS) to kernel 4.2.0-34 (Ubuntu 14.04.4
LTS). I've received helpful advice from members of the libusb and
linux-usb mailing lists on how to generate useful logs to help diagnose
the issue, and have filed a bug to track this issue at:

    https://bugzilla.kernel.org/show_bug.cgi?id=115961

It seems that with kernels newer than the 3.19 series (I've tested on
4.2.0-34, and just repeated the tests on the latest 4.5.0 vanilla
release),
the cameras lock up, and cannot stream image data to the user
application. I
am able to resolve the issue on 4.2.0-34 by disabling USB power
management by adding "usbcore.autosuspend=-1". On the 4.5 kernel, this
"trick" doesn't work at all, and I have no way to get the cameras to
stream data. I can do simple USB control requests to query things like
register values and serial numbers, but that's it. Asynchronous bulk
transfers never succeed.

I am using the libusb library to communicate with the cameras, and have
a fairly simple minimal working example to test the cameras, which:
   * Creates a default libusb context.
   * Enumerates available USB devices.
   * Filters out a select device based on the vendor/product ID values.
   * Opens the device, claims the bulk interface, and starts requesting
     frames of image data via async bulk transfer requests.

There are select cases where the issue does not arise:

Special Cases:
   * The issue does not occur when using USB2.0 cameras on a USB2.0 port,
regardless of the kernel in use.
   * The issues occur only on Intel 8 Series and Intel 9 Series USB3.0
host controllers with 4.x kernels.
   * Intel 10 Series host controllers have not yet been tested.
   * The issues never occur on Fresco or Renesas host controllers,
regardless of the kernel in use.
   * From visual inspection of lsusb output, the issue only appears to
happen when the U1 and U2 options are available to the device.

okay, this is probably what's happening. Intel hosts are the only ones
actively doing LPM, for all other hosts we don't have support.

Here are a few question:

a) Are you sure your cameras implement proper LPM ?
b) I'm assuming the following makes the problem go away, can you test ?

diff --git a/drivers/usb/host/xhci-pci.c b/drivers/usb/host/xhci-pci.c
index f0640b7a1c42..9c3ead114ad5 100644
--- a/drivers/usb/host/xhci-pci.c
+++ b/drivers/usb/host/xhci-pci.c
@@ -127,8 +127,8 @@ static void xhci_pci_quirks(struct device *dev, struct 
xhci_hcd *xhci)
                xhci->quirks |= XHCI_TRUST_TX_LENGTH;

        if (pdev->vendor == PCI_VENDOR_ID_INTEL) {
-               xhci->quirks |= XHCI_LPM_SUPPORT;
-               xhci->quirks |= XHCI_INTEL_HOST;
+               /* xhci->quirks |= XHCI_LPM_SUPPORT; */
+               /* xhci->quirks |= XHCI_INTEL_HOST; */
                xhci->quirks |= XHCI_AVOID_BEI;
        }
        if (pdev->vendor == PCI_VENDOR_ID_INTEL &&

c) I remember that I mentioned to Mathias that I'd seen XHCI LPM go a
bit crazy and trigger constant LPM transactions; maybe you're just the
first one to notice an actual functional breakage due to that.


LPM (Link power management) is only enabled on Intel Hosts in the xhci driver.

usbmon show there is a constant loop of enabling and disabling LPM.
Parsing the usbmon output it boils down to:

Set SEL                                 0a0a0002 0002    (usb_enabme_lpm)
Set port feature U1 timeout             50us
Set device feaurte U1 Enable
Set SEL                                 0a0a0002 0002
Set port feature U2 timeout             ~10ms
Set device feaurte U2 Enable
Set port feature U1 timeout             0us             (usb_disable_lpm)
Clear device feaurte U1 Enable
Set port feature U2 timeout             0us
Clear device feaurte U2 Enable          
-back to beginning. setting SEL (system exit latency)

sometimes there's a 24 byte bulk out transfer followed by a 16 byte bulk in 
transfer
in between the enable_lpm() and usb_disable_lpm()

But how we end up in the LPM loop, or how we sustain it is not yet clear.

This code is in usb core. Could you add usb core debugging with:

echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
and show the dmesg output.

and skip the xhci debugging, otherwise it will be very hard to read.

-Mathias
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to