On 2.10.2019 15.28, Bernhard Gebetsberger wrote:
There has been a regression in the xhci driver since kernel version 4.20, on
some systems some usb devices won't work until the system gets rebooted.
The error message in dmesg is "WARN Set TR Deq Ptr cmd failed due to incorrect slot
or ep state", although for some reason there are some usb devices that are affected
by this issue but don't throw the error message(including the device I'm using, I got the
error in previous kernel versions though).
It seems like this bug can also lead to system instability, one user reported
in the bug tracker(https://bugzilla.kernel.org/show_bug.cgi?id=202541#c58) that
he got a system freeze because of this when using kernel 5.3.1.
Ok, lets take a look at this.
Some of the symptoms vary a bit in the report, so lets focus on ones that
show: "WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state"
When looking at the responses in the bug tracker, it looks like it mostly
affects Ryzen based systems with 300 series motherboards, although there are
some other affected systems as well. It doesn't only affect wifi/bluetooth
sticks, some users even got this issue when connecting their smartphone or
their external hard drive to their PC.
I have uploaded the whole dmesg file and the tracing file to transfer.sh:
https://transfer.sh/zYohl/dmesg and https://transfer.sh/KNbFL/xhci-trace
Hmm, trying to download these just shows "Not Found"
Could someone with a affected system enable tracing and dynamic debug on a
recent kernel, take logs and traces of one failing instance where the message
"WARN Set TR Deq Ptr cmd failed due to incorrect slot or ep state" is seen.
mount -t debugfs none /sys/kernel/debug
echo 'module xhci_hcd =p' >/sys/kernel/debug/dynamic_debug/control
echo 'module usbcore =p' >/sys/kernel/debug/dynamic_debug/control
echo 81920 > /sys/kernel/debug/tracing/buffer_size_kb
echo 1 > /sys/kernel/debug/tracing/events/xhci-hcd/enable
< Trigger the issue >
Send output of dmesg
Send content of /sys/kernel/debug/tracing/trace
The issues occur since commit f8f80be501aa2f10669585c3e328fad079d8cb3a "xhci: Use
soft retry to recover faster from transaction errors". I think this commit should be
reverted at least until a workaround has been found, especially since the next two kernel
versions will be used by a lot of distributions(5.4 because it's a LTS kernel and 5.5
will probably be used in Ubuntu 20.04) so more users would be affected by this.
There some time left before 5.4 is out, lets see if we can find the root cause