Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855

Alan Stern Wed, 02 May 2018 10:52:58 -0700

On Wed, 2 May 2018, Mathias Nyman wrote:

> On 24.04.2018 16:50, Alan Stern wrote:
> > On Tue, 24 Apr 2018, Mathias Nyman wrote:
> > 
> >>>>> In this situation, the HCD_WAKEUP_PENDING(hcd) test in
> >>>>> hcd-pci.c:suspend_common() should prevent the controller from going
> >>>>> back into D3.  The WAKEUP_PENDING bit gets set in
> >>>>> usb_hcd_resume_root_hub() and it doesn't get cleared until
> >>>>> hcd_bus_resume() runs.
> >>>>>
> >>>>
> >>>> I think xhci never calls usb_hcd_resume_root_hub() in xhci_resume() in 
> >>>> this
> >>>> specific failing case
> >>>>
> >>>> xhci_resume() has a check:
> >>>> /* Resume root hubs only when have pending events. */
> >>>>      status = readl(&xhci->op_regs->status);
> >>>>        if (status & STS_EINT) {
> >>>>          usb_hcd_resume_root_hub(xhci->shared_hcd);
> >>>>          usb_hcd_resume_root_hub(hcd);
> >>>>        }
> >>>>
> >>>> If the check fails, then WAKEUP_PENDING bit is not set, and runtime PM
> >>>> can suspend host controller again. when xhci driver finally gets to 
> >>>> handle the interrupt
> >>>> the controller may be in D3 already
> >>>>
> >>>> This should only happen if xhci_resume() is called before xhci driver 
> >>>> sees a pending interrupt,
> >>>> could be possible as xhci has interrupt moderation enabled.
> >>>
> >>> Then maybe that test should be removed.  Calling
> >>> usb_hcd_resume_root_hub() for every wakeup shouldn't be too bad,
> >>> because there probably are not very many times when the controller gets
> >>> resumed without the root hub also being resumed.
> >>>
> >>
> >> The check was added to fix system suspend issue on a runtime suspended 
> >> host:
> >>
> >> commit d6236f6d1d885aa19d1cd7317346fe795227a3cc
> >>
> >>       xhci: Fix runtime suspended xhci from blocking system suspend.
> >>       
> >>       The system suspend flow as following:
> >>       1, Freeze all user processes and kenrel threads.
> >>       
> >>       2, Try to suspend all devices.
> >>       
> >>       2.1, If pci device is in RPM suspended state, then pci driver will 
> >> try
> >>       to resume it to RPM active state in the prepare stage.
> >>       
> >>       2.2, xhci_resume function calls usb_hcd_resume_root_hub to queue two
> >>       workqueue items to resume usb2&usb3 roothub devices.
> >>       
> >>       2.3, Call suspend callbacks of devices.
> >>       
> >>       2.3.1, All suspend callbacks of all hcd's children, including
> >>       roothub devices are called.
> >>       
> >>       2.3.2, Finally, hcd_pci_suspend callback is called.
> >>       
> >>       Due to workqueue threads were already frozen in step 1, the workqueue
> >>       items can't be scheduled, and the roothub devices can't be resumed in
> >>       this flow. The HCD_FLAG_WAKEUP_PENDING flag which is set in
> >>       usb_hcd_resume_root_hub won't be cleared. Finally,
> >>       hcd_pci_suspend will return -EBUSY, and system suspend fails.
> > 
> > Hmmm.  I don't recall seeing this problem occur with ehci-hcd.  But
> > then, I haven't tested it very much recently.
> > 
> > We could change to a different work queue, one that doesn't get
> > frozen.  But there's no guarantee that the work items would run before
> > your step 2.3.2.
> > 
> > Maybe we can avoid step 2.1.  I think there have been some recent
> > changes to the PM code in this area.  There may be a flag you can set
> > that will prevent the PCI core from resuming the host controller.
> > 
> > Or maybe we can change step 2.3.1, so that the root hub's suspend
> > callback will first do a resume if the WAKEUP_PENDING flag is set.
> > That might be the most reliable approach.
> > 
> 
> I'm not sure I understand the last suggestion, could you open up how it
> would work?


Here's what I had in mind.  See if you think this would work.

Consider choose_wakeup() in core/driver.c.  That subroutine gets called
by usb_suspend() when step 2.3.1 wants to suspend a USB device.  We
could patch it as follows:

--- usb-4.x.orig/drivers/usb/core/driver.c
+++ usb-4.x/drivers/usb/core/driver.c
@@ -1449,11 +1449,21 @@ static void choose_wakeup(struct usb_dev
         */
        w = device_may_wakeup(&udev->dev);
 
-       /* If the device is autosuspended with the wrong wakeup setting,
+       /*
+        * If the device is autosuspended with the wrong wakeup setting,
         * autoresume now so the setting can be changed.
+        *
+        * Likewise, if the device is an autosuspended root hub and the
+        * hcd needs to wake it up before the controller can be suspended,
+        * resume it now to clear the WAKEUP_PENDING flag.
         */
-       if (udev->state == USB_STATE_SUSPENDED && w != udev->do_remote_wakeup)
-               pm_runtime_resume(&udev->dev);
+       if (udev->state == USB_STATE_SUSPENDED) {
+               struct usb_hcd  *hcd = bus_to_hcd(udev->bus);
+
+               if (w != udev->do_remote_wakeup ||
+                               (!udev->parent && HCD_WAKEUP_PENDING(hcd)))
+                       pm_runtime_resume(&udev->dev);
+       }
        udev->do_remote_wakeup = w;
 }
 

> I started approaching this from another direction, mainly making sure we don't
> immediately runtime suspend the host controller after resume. Adding a 
> next_statechange
> minimal time between resume and suspend, and checking for pending events in 
> xhci_suspend().
> 
> I'll have some patches to test for [email protected] soon
> 
> These are generic checks that ehci_suspend() also does

To tell the truth, I'm not sure how necessary those next_statechange 
tests really are.

Alan Stern

--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Since Linux 4.13 tlp or powertop usage cause "xHCI host controller not responding, assume dead" on Dell 5855

Reply via email to