On Mon, 15 Sep 2014 11:59:15 -0400
Alan Stern <[email protected]> wrote:
> On Sat, 13 Sep 2014, Joe Lawrence wrote:
>
> > Hi Alan,
> >
> > I've collected 16 crashes since kicking off automated tests a little
> > over 24 hrs ago.
> >
> > Each crash hit the BUG in qh_destroy() and the only unique debugging
> > printk from ehci_stop() was: "ehci_stop: ehci->num_async = 0".
>
> What about error messages from the check_async_ring routine?
In the crash instances, the error messages never occurred. (I had
added a BUG() in that code block, just to be really clear if/when that
event occurred).
> You can check for this. Sprinkle ehci_info messages throughout
> ehci_stop, printing the value of ehci->async->qh_next.qh. It should be
> NULL the entire time.
I re-ran the tests with the following changes:
- moved num_async debug up above struct ehci_hcd.priv[0] :)
- added verbose debug msg in check_async_ring to display the new
num_async value
- added ehci_info to every other line in ehci_stop
% grep '0000:66:00.1' vmcore-dmesg.txt
[ 4936.956739] pci 0000:66:00.1: [8086:1d2d] type 00 class 0x0c0320
[ 4936.956789] pci 0000:66:00.1: reg 0x10: [mem 0xa0000400-0xa00007ff]
[ 4936.957019] pci 0000:66:00.1: PME# supported from D0 D3hot D3cold
[ 4939.564933] ehci-pci 0000:66:00.1: EHCI Host Controller
[ 4939.565462] ehci-pci 0000:66:00.1: new USB bus registered, assigned bus
number 4
[ 4939.565498] ehci-pci 0000:66:00.1: debug port 2
[ 4939.569466] ehci-pci 0000:66:00.1: cache line size of 64 is not supported
[ 4939.569533] ehci-pci 0000:66:00.1: irq 13, io mem 0xa0000400
[ 4939.574794] ehci-pci 0000:66:00.1: USB 2.0 started, EHCI 1.00
[ 4939.575410] usb usb4: SerialNumber: 0000:66:00.1
[ 4940.089044] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4941.795700] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 0, qh= (null)
[ 4941.895713] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4941.943761] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 0, qh= (null)
[ 4942.002715] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.064346] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 0, qh= (null)
[ 4942.156646] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.226028] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 0, qh= (null)
[ 4942.274185] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.329196] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 0, qh= (null)
[ 4942.391055] ehci-pci 0000:66:00.1: check_async_ring:970
ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
[ 4942.629653] ehci-pci 0000:66:00.1: HC died; cleaning up
[ 4945.663153] ehci-pci 0000:66:00.1: remove, state 1
[ 4945.733934] ehci-pci 0000:66:00.1: ehci_stop:420
ehci(ffff8808419839d0)->num_async = 1
[ 4945.774242] ehci-pci 0000:66:00.1: ehci_stop:427
ehci(ffff8808419839d0)->num_async = 1
[ 4945.814791] ehci-pci 0000:66:00.1: ehci_stop:429
ehci(ffff8808419839d0)->num_async = 1
[ 4945.855938] ehci-pci 0000:66:00.1: ehci_stop:431
ehci(ffff8808419839d0)->num_async = 1
[ 4945.897145] ehci-pci 0000:66:00.1: ehci_stop:433
ehci(ffff8808419839d0)->num_async = 1
[ 4945.938791] ehci-pci 0000:66:00.1: ehci_stop:436
ehci(ffff8808419839d0)->num_async = 1
[ 4945.980491] ehci-pci 0000:66:00.1: ehci_stop:438
ehci(ffff8808419839d0)->num_async = 1
[ 4946.022221] ehci-pci 0000:66:00.1: ehci_stop:440
ehci(ffff8808419839d0)->num_async = 1
[ 4946.063936] ehci-pci 0000:66:00.1: ehci_stop:444
ehci(ffff8808419839d0)->num_async = 1
[ 4946.105631] ehci-pci 0000:66:00.1: ehci_stop:446
ehci(ffff8808419839d0)->num_async = 1
[ 4946.241802] ehci-pci 0000:66:00.1: ehci_stop:448
ehci(ffff8808419839d0)->num_async = 1
where ehci_stop line 420 is function entry and line 448 function
return. Notice that check_async_ring incremented num_async just before
the HC died/removed and then num_async was 1 throughout ehci_stop.
qh_destroy then BUGs with a struct ehci_hcd *ehci of ffff8808419839d0.
-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html