On Tue, 16 Sep 2014, Joe Lawrence wrote:

> > You can check for this.  Sprinkle ehci_info messages throughout 
> > ehci_stop, printing the value of ehci->async->qh_next.qh.  It should be 
> > NULL the entire time.
> 
> I re-ran the tests with the following changes:
> 
>   - moved num_async debug up above struct ehci_hcd.priv[0] :)
>   - added verbose debug msg in check_async_ring to display the new
>     num_async value
>   - added ehci_info to every other line in ehci_stop
> 
> % grep '0000:66:00.1' vmcore-dmesg.txt
> 
> [ 4936.956739] pci 0000:66:00.1: [8086:1d2d] type 00 class 0x0c0320
> [ 4936.956789] pci 0000:66:00.1: reg 0x10: [mem 0xa0000400-0xa00007ff]
> [ 4936.957019] pci 0000:66:00.1: PME# supported from D0 D3hot D3cold
> [ 4939.564933] ehci-pci 0000:66:00.1: EHCI Host Controller
> [ 4939.565462] ehci-pci 0000:66:00.1: new USB bus registered, assigned bus 
> number 4
> [ 4939.565498] ehci-pci 0000:66:00.1: debug port 2
> [ 4939.569466] ehci-pci 0000:66:00.1: cache line size of 64 is not supported
> [ 4939.569533] ehci-pci 0000:66:00.1: irq 13, io mem 0xa0000400
> [ 4939.574794] ehci-pci 0000:66:00.1: USB 2.0 started, EHCI 1.00
> [ 4939.575410] usb usb4: SerialNumber: 0000:66:00.1
> [ 4940.089044] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4941.795700] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4941.895713] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4941.943761] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.002715] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.064346] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.156646] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.226028] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.274185] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.329196] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 0, qh=          (null)
> [ 4942.391055] ehci-pci 0000:66:00.1: check_async_ring:970 
> ehci(ffff8808419839d0)->num_async = 1, qh=ffff88083d6b4558
> [ 4942.629653] ehci-pci 0000:66:00.1: HC died; cleaning up
> [ 4945.663153] ehci-pci 0000:66:00.1: remove, state 1
> [ 4945.733934] ehci-pci 0000:66:00.1: ehci_stop:420 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.774242] ehci-pci 0000:66:00.1: ehci_stop:427 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.814791] ehci-pci 0000:66:00.1: ehci_stop:429 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.855938] ehci-pci 0000:66:00.1: ehci_stop:431 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.897145] ehci-pci 0000:66:00.1: ehci_stop:433 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.938791] ehci-pci 0000:66:00.1: ehci_stop:436 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4945.980491] ehci-pci 0000:66:00.1: ehci_stop:438 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4946.022221] ehci-pci 0000:66:00.1: ehci_stop:440 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4946.063936] ehci-pci 0000:66:00.1: ehci_stop:444 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4946.105631] ehci-pci 0000:66:00.1: ehci_stop:446 
> ehci(ffff8808419839d0)->num_async = 1
> [ 4946.241802] ehci-pci 0000:66:00.1: ehci_stop:448 
> ehci(ffff8808419839d0)->num_async = 1
> 
> where ehci_stop line 420 is function entry and line 448 function
> return.  Notice that check_async_ring incremented num_async just before
> the HC died/removed and then num_async was 1 throughout ehci_stop.
> 
> qh_destroy then BUGs with a struct ehci_hcd *ehci of ffff8808419839d0.

This says that ehci->num_async ended up equal to 1.  But in your 
previous email, you said the final printout indicated it was equal to 
0.  Did something change in the meantime?

Anyway, the log above means that a QH was linked before the HC died,
but then it was never unlinked.  Please add a line at the start of
ehci_endpoint_disable, printing out the value of ep->hcpriv for
comparison with the values listed above.

Also, let's see the contents of *qh and *(qh->hw) for the left-over QH
at *(ehci->async->qh_next.qh).  You can get those right away from your 
crash dump data, right?

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to