On Wed, 28 Sep 2022, Hans Petter Selasky wrote:

On 9/28/22 11:07, Ivan Quitschal wrote:


On Tue, 27 Sep 2022, Hans Petter Selasky wrote:

On 9/27/22 15:22, Hans Petter Selasky wrote:
On 9/27/22 14:17, Ivan Quitschal wrote:


On Tue, 27 Sep 2022, Hans Petter Selasky wrote:

On 9/27/22 02:24, Alexander Motin wrote:
On 26.09.2022 17:29, Hans Petter Selasky wrote:
I've got a supposedly "broken" if_ure dongle from Alexander, but I'm unable to reproduce the if_ure hang on two different pieces of XHCI hardware, Intel based and AMD based, which I've got.

This leads me to believe there is a bug in the XHCI driver or hardware on your system.

Can you share the pciconfig -lv output for your XHCI controllers?

I have two laptops of different generations reproducing this problem, but both are having Thunderbolt on the USB-C ports:

This is one (7th Gen Core i7):

xhci1@pci0:56:0:0:      class=0x0c0330 rev=0x02 hdr=0x00 vendor=0x8086 device=0x15d4 subvendor=0x2222 subdevice=0x1111
     vendor     = 'Intel Corporation'
     device     = 'JHL6540 Thunderbolt 3 USB Controller (C step) [Alpine Ridge 4C 2016]'
     class      = serial bus
     subclass   = USB
     bar   [10] = type Memory, range 32, base 0xc3f00000, size 65536, enabled
     cap 01[80] = powerspec 3  supports D0 D1 D2 D3  current D0
     cap 05[88] = MSI supports 8 messages, 64 bit enabled with 1 message
     cap 10[c0] = PCI-Express 2 endpoint max data 128(128) RO NS
                  max read 512
                  link x4(x4) speed 2.5(2.5) ASPM disabled(L0s/L1) ClockPM disabled
     ecap 0003[100] = Serial 1 20ff910876f10c00
     ecap 0001[200] = AER 1 0 fatal 0 non-fatal 1 corrected
     ecap 0002[300] = VC 1 max VC0
     ecap 0004[400] = Power Budgeting 1
     ecap 000b[500] = Vendor [1] ID 1234 Rev 1 Length 216
     ecap 0018[600] = LTR 1
     ecap 0019[700] = PCIe Sec 1 lane errors 0

This is another (11th Gen Core i7);

xhci0@pci0:0:13:0:      class=0x0c0330 rev=0x01 hdr=0x00 vendor=0x8086 device=0x9a13 subvendor=0x1028 subdevice=0x0991
     vendor     = 'Intel Corporation'
     device     = 'Tiger Lake-LP Thunderbolt 4 USB Controller'
     class      = serial bus
     subclass   = USB
     bar   [10] = type Memory, range 64, base 0x60552c0000, size 65536, enabled
     cap 01[70] = powerspec 2  supports D0 D3  current D0
     cap 05[80] = MSI supports 8 messages, 64 bit enabled with 1 message
     cap 09[90] = vendor (length 20) Intel cap 15 version 0
     cap 09[b0] = vendor (length 0) Intel cap 0 version 1

Does the system you also has Thunderbolt chip, or you use native Intel chipet's XHCI?

Also, when running the stress test and you see the traffic stops, what happens if you run this command as root on the ugen which the if_ure belongs to:

usbconfig -d ugenX.Y dump_string 0

Does the traffic resume?

Nope. Out of 4 times when traffic stopped 2 times it reported <read error> and 2 times it completed successfully, but it neither case it recovered traffic.  Only reset recovered it.


Hi Alexander,

Could you run "usbdump -d X.Y" at the same time to capture all the errors?

Looking especially for USB_ERR_TIMEOUT .

I have this:

xhci0@pci0:3:0:3:    class=0x0c0330 rev=0x00 hdr=0x00 vendor=0x1022 device=0x15e0 subvendor=0x1849 subdevice=0xffff
   vendor     = 'Advanced Micro Devices, Inc. [AMD]'
   device     = 'Raven USB 3.1'
   class      = serial bus
   subclass   = USB

xhci0@pci0:0:20:0:    class=0x0c0330 rev=0x21 hdr=0x00 vendor=0x8086 device=0x9d2f subvendor=0x8086 subdevice=0x9d2f
   vendor     = 'Intel Corporation'
   device     = 'Sunrise Point-LP USB 3.0 xHCI Controller'
   class      = serial bus
   subclass   = USB

--HPS



hi Hans

i think i got some good logs for you

before the problem i ran this:

ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA)

# usbconfig -d ugen0.10 >> before
# usbconfig -d ugen0.10 dump_all_desc >> before
# usbconfig -d ugen0.10 dump_stats >> before_status

the after the problem happened i ran

# usbconfig -d ugen0.10 >> after
# usbconfig -d ugen0.10 dump_all_desc >> after
# usbconfig -d ugen0.10 dump_stats >> after_status


just by looking i already see some problems comparing both

for example

before the problem we have:

----------------------
ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA) ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA)

   bLength = 0x0012
   bDescriptorType = 0x0001
   bcdUSB = 0x0300
   bDeviceClass = 0x0000  <Probed by interface class>
   bDeviceSubClass = 0x0000
   bDeviceProtocol = 0x0000
   bMaxPacketSize0 = 0x0009
   idVendor = 0x2357
   idProduct = 0x0601
   bcdDevice = 0x3000
****
   iManufacturer = 0x0001  <TP-Link>
   iProduct = 0x0002  <USB 10/100/1000 LAN>
   iSerialNumber = 0x0006  <000001>
   bNumConfigurations = 0x0002

------------------------

after the problem

--------------------------
ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA) ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA)

   bLength = 0x0012
   bDescriptorType = 0x0001
   bcdUSB = 0x0300
   bDeviceClass = 0x0000  <Probed by interface class>
   bDeviceSubClass = 0x0000
   bDeviceProtocol = 0x0000
   bMaxPacketSize0 = 0x0009
   idVendor = 0x2357
   idProduct = 0x0601
   bcdDevice = 0x3000
****
iManufacturer = 0x0001  <retrieving string failed>
   iProduct = 0x0002  <retrieving string failed>
   iSerialNumber = 0x0006  <retrieving string failed>
   bNumConfigurations = 0x0002

  Configuration index 0 --------------------------


the log in ttyv0 was this:

ure0: at uhub0, port 14, addr 9 (disconnected)
Sep 27 08:55:58 tzk-inspiron dhclient[1201]: receive_packet failed on ue0: Device not configured Sep 27 08:55:58 tzk-inspiron dhclient[1201]: ioctl(SIOCGIFFLAGS) on ue0: Operation not permitted Sep 27 08:55:58 tzk-inspiron dhclient[1201]: Interface ue0 no longer appears valid. Sep 27 08:55:58 tzk-inspiron dhclient[1201]: No live interfaces to poll on - exiting.
Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
Sep 27 08:55:58 tzk-inspiron dhclient[1201]: connection closed
Sep 27 08:55:58 tzk-inspiron dhclient[1201]: exiting.
rgephy0: detached
miibus0: detached
ure0: detached


difference between before_status and after_status

before_status:

ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA)

{
     UE_CONTROL_OK       : 2389
     UE_ISOCHRONOUS_OK   : 0
     UE_BULK_OK          : 803
     UE_INTERRUPT_OK     : 0
     UE_CONTROL_FAIL     : 0
     UE_ISOCHRONOUS_FAIL : 0
     UE_BULK_FAIL        : 0
     UE_INTERRUPT_FAIL   : 0
}


after_status:

ugen0.10: <TP-Link USB 10/100/1000 LAN> at usbus0, cfg=0 md=HOST spd=SUPER (5.0Gbps) pwr=ON (72mA)

{
     UE_CONTROL_OK       : 4275
     UE_ISOCHRONOUS_OK   : 0
     UE_BULK_OK          : 1126702
     UE_INTERRUPT_OK     : 0
     UE_CONTROL_FAIL     : 326
     UE_ISOCHRONOUS_FAIL : 0
     UE_BULK_FAIL        : 42
     UE_INTERRUPT_FAIL   : 0
}



i hope that helps


all log files are attached

thanks

--tzk


One more test request:

Make sure you have a kernel with "options USB_DEBUG".

Then after "iperf" says 0bits/s, then you run:

sysctl hw.usb.xhci.debug=29

Then run those usbconfig commands.

Then then:

sysctl hw.usb.xhci.debug=0

And collect the logs from /var/log/messages .

Like said earlier, I wonder if the fault is in the XHCI controller itself or something that has to do with thunderbolt, because on my machine there is no indication of a if_ure device failure. The only way to know for sure is to buy a USB 3.0 wire analyzer, like the beagle one, but I'm unsure if they support USB-C .

Thank you!

--HPS


FYI: There is some experimental thunderbolt support at:

https://nam12.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fhselasky%2Fusb4&amp;data=05%7C01%7C%7Cc2f534f631fd47afec9908daa135d60b%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C637999549868812246%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&amp;sdata=uL5DwbPcldediZmBiufQXnkF7%2F2WQTizqVLAYBHZjqA%3D&amp;reserved=0

But I'm not sure if it supports the hardware you've got.

--HPS



Hi Hans

i got two log versions for you, one with the constant set to 2048 (the working version) , and the other with no patches whatsoever (the bad version)

since the entire logs reached more than 150M of size, i had to cut to the last 1000 lines , hope toats enough

pleaes find attached the two files

the xhci_NOT_working i stoped recording right after the problem ocurred

please let me know if you need something else

thank you

--tzk

I need the full log.

The XHCI driver is very verbose you see.

Maybe you can do some filtering, like figuring out all the status codes you see:

status=1
status=13

and so on.

--HPS



hi Hans

i will try to do some logs filtered like you said , and will also try to compress as much as i want the full ones. ( i may be able to transform 150M into no more than 10M, its all plain text)

i will send you here and the list as soon as i do this .. dont know if ill be able to do this today, but tomorrow you'll get the full logs even if i have to open the ftp for you myself :- )


thanks

--tzk

Reply via email to