On Sat, Sep 17, 2022 at 10:40:29PM +0200, Marcus Glocker wrote:
> On Sat, Sep 17, 2022 at 11:22:41AM -0700, Stephan Somogyi wrote:
>
> > Starting with arm64 snapshot kernel 1818 and continuing to 1822, the latest
> > snapshot, I've been experiencing a persistent problem with the RPi3's USB
> > bus locking up in a way that requires physical access to power cycle, and
> > is thus a fairly serious regression. This system has been running
> > continuously on -current since about 6.8-current without anything even
> > remotely like this happening.
> >
> > The 100bT interface is at smsc0 on the usb bus. Initially, it looked like
> > there may have been a weird race condition since I also had a USB-based
> > flash drive plugged in, but moving that drive around to the other ports,
> > and eventual complete removal, hasn't stopped the hanging.
> >
> > The hang is visible in dmesg as follows:
> >
> > usbd_start_next: error=5
> > usbd_start_next: error=5
> > usbd_free_xfer: xfer=0xffffff8004e74a20 not free
> > smsc0: warning: Failed to read register 0x114
> > smsc0: warning: MII is busy
> >
> > Searching around, I find references to some of these errors in FreeBSD and
> > OpenBSD going back at least to 2014, but no clear resolution. It's
> > _possible_ that I have some kind of creeping hardware failure, but it
> > doesn't seem likely.
> >
> > Once the error messages appear, I can no longer access the system over the
> > network. I've since connected the serial console. If I try to reboot while
> > it's in this state, the system will hang hard and not even respond to the
> > console. If I try `ifconfig smsc0 down` it hangs in the same way.
> >
> > While the USB drive was still part of the repro configuration, attempting
> > to sync or otherwise access the drive also resulted in the hard hang,
> > leading me to conclude this is a USB issue rather than either a mass
> > storage or an ethernet issue.
> >
> > I've also done the usual variable elimination by using different USB
> > drives, different ethernet cables, different port & different switch, etc.
> > I no longer appear to be able to isolate this further myself.
> >
> > My only recourse once it's in this state is to hard power cycle.
> >
> > I'm happy to try and help debug further; I strongly prefer that
> > 7.2-release/-stable doesn't include this behavior.
> >
> > s.
>
> We had some changes recently in dwctwo(4). I currently think that your
> issue might be related to the last commit to dwc2.c revision 1.67. I'll
> prepare a diff and send it to you for testing by tomorrow. We might
> need some iterations. Worst case we can try to revert that commit.
Does this diff fix your issue?
Index: sys/dev/usb/dwc2/dwc2.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2.c,v
retrieving revision 1.67
diff -u -p -u -p -r1.67 dwc2.c
--- sys/dev/usb/dwc2/dwc2.c 10 Sep 2022 08:13:16 -0000 1.67
+++ sys/dev/usb/dwc2/dwc2.c 18 Sep 2022 07:41:24 -0000
@@ -242,7 +242,6 @@ dwc2_allocx(struct usbd_bus *bus)
void
dwc2_freex(struct usbd_bus *bus, struct usbd_xfer *xfer)
{
- struct dwc2_xfer *dxfer = DWC2_XFER2DXFER(xfer);
struct dwc2_softc *sc = DWC2_BUS2SC(bus);
DPRINTFN(10, "\n");
@@ -255,7 +254,6 @@ dwc2_freex(struct usbd_bus *bus, struct
xfer->busy_free = XFER_FREE;
#endif
DWC2_EVCNT_INCR(sc->sc_ev_xferpoolput);
- dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes);
pool_put(&sc->sc_xferpool, xfer);
}
Index: sys/dev/usb/dwc2/dwc2_hcd.c
===================================================================
RCS file: /cvs/src/sys/dev/usb/dwc2/dwc2_hcd.c,v
retrieving revision 1.28
diff -u -p -u -p -r1.28 dwc2_hcd.c
--- sys/dev/usb/dwc2/dwc2_hcd.c 9 Sep 2022 21:16:54 -0000 1.28
+++ sys/dev/usb/dwc2/dwc2_hcd.c 18 Sep 2022 07:41:24 -0000
@@ -4312,6 +4312,7 @@ void dwc2_host_complete(struct dwc2_hsot
xfer);
}
+ dwc2_hcd_urb_free(sc->sc_hsotg, dxfer->urb, xfer->nframes);
qtd->urb = NULL;
timeout_del(&xfer->timeout_handle);
usb_rem_task(xfer->device, &xfer->abort_task);