On Fri, 28 Apr 2017, Alan Stern wrote:
> On Fri, 28 Apr 2017, Bart Van Assche wrote:
>
> > Hello,
> >
> > Every time I boot a particular development server the complaint shown
> > below appears in the system log. I don't know when this behavior has
> > been introduced. But I noticed that I can reproduce this with older
> > kernel versions, e.g. 4.4.63. Does anyone know what is going on or how
> > to fix this?
> >
> > Thanks,
> >
> > Bart.
> >
> > usb 1-8: new low-speed USB device number 2 using xhci_hcd
> > usb 3-1: new high-speed USB device number 2 using ehci-pci
> >
> > =========================================================
> > [ INFO: possible irq lock inversion dependency detected ]
> > 4.11.0-rc8-dbg+ #1 Not tainted
> > ---------------------------------------------------------
> > swapper/7/0 just changed the state of lock:
> > �(&(&ehci->lock)->rlock){-.-...}, at: [<ffffffffa0130a69>]
> > ehci_hrtimer_func+0x29/0xc0 [ehci_hcd]
> > but this lock took another, HARDIRQ-unsafe lock in the past:
> > �(hcd_urb_list_lock){+.....}
> >
> >
> > and interrupts could create inverse lock ordering between them.
>
> I often find these lockdep splats difficult to understand. In this
> case, why does lockdep think hcd_urb_list_lock is hardirq-unsafe? It's
> supposed to be hardirq-safe.
Peter Zijlstra kindly provided the answer to my question off-list; see
below.
> > other info that might help us debug this:
> > �Possible interrupt unsafe locking scenario:
> >
> > �������CPU0��������������������CPU1
> > �������----��������������������----
> > � lock(hcd_urb_list_lock);
> > �������������������������������local_irq_disable();
> > �������������������������������lock(&(&ehci->lock)->rlock);
> > �������������������������������lock(hcd_urb_list_lock);
> > � <Interrupt>
> > ����lock(&(&ehci->lock)->rlock);
> > �*** DEADLOCK ***
> >
> > no locks held by swapper/7/0.
> > the shortest dependencies between 2nd lock and 1st lock:
> > �-> (hcd_urb_list_lock){+.....} ops: 252 {
> > ����HARDIRQ-ON-W at:
> > ����������������������__lock_acquire+0x602/0x1280
> > ����������������������lock_acquire+0xd5/0x1c0
> > ����������������������_raw_spin_lock+0x2f/0x40
> > ����������������������usb_hcd_unlink_urb_from_ep+0x1b/0x60 [usbcore]
> > ����������������������xhci_giveback_urb_in_irq.isra.45+0x70/0x1b0 [xhci_hcd]
> > ����������������������finish_td.constprop.60+0x1d8/0x2e0 [xhci_hcd]
> > ����������������������xhci_irq+0xdd6/0x1fa0 [xhci_hcd]
> > ����������������������usb_hcd_irq+0x26/0x40 [usbcore]
> > ����������������������irq_forced_thread_fn+0x2f/0x70
> > ����������������������irq_thread+0x149/0x1d0
> > ����������������������kthread+0x113/0x150
> > ����������������������ret_from_fork+0x2e/0x40
He wrote:
> There, that's where we hold the lock with interrupts enabled.
>
> So someone forced interrupt threading and the code assumes the interrupt
> code has IRQs disabled or something along those lines.
So we should be able to fix the problem by changing spin_lock/unlock in
xhci_irq() to spin_lock_irqsave/restore. Bart, please try out the
patch below.
Alan Stern
Index: usb-4.x/drivers/usb/host/xhci-ring.c
===================================================================
--- usb-4.x.orig/drivers/usb/host/xhci-ring.c
+++ usb-4.x/drivers/usb/host/xhci-ring.c
@@ -2628,26 +2628,27 @@ static int xhci_handle_event(struct xhci
irqreturn_t xhci_irq(struct usb_hcd *hcd)
{
struct xhci_hcd *xhci = hcd_to_xhci(hcd);
+ unsigned long flags;
u32 status;
u64 temp_64;
union xhci_trb *event_ring_deq;
dma_addr_t deq;
- spin_lock(&xhci->lock);
+ spin_lock_irqsave(&xhci->lock, flags);
/* Check if the xHC generated the interrupt, or the irq is shared */
status = readl(&xhci->op_regs->status);
if (status == 0xffffffff)
goto hw_died;
if (!(status & STS_EINT)) {
- spin_unlock(&xhci->lock);
+ spin_unlock_irqrestore(&xhci->lock, flags);
return IRQ_NONE;
}
if (status & STS_FATAL) {
xhci_warn(xhci, "WARNING: Host System Error\n");
xhci_halt(xhci);
hw_died:
- spin_unlock(&xhci->lock);
+ spin_unlock_irqrestore(&xhci->lock, flags);
return IRQ_HANDLED;
}
@@ -2679,7 +2680,7 @@ hw_died:
temp_64 = xhci_read_64(xhci, &xhci->ir_set->erst_dequeue);
xhci_write_64(xhci, temp_64 | ERST_EHB,
&xhci->ir_set->erst_dequeue);
- spin_unlock(&xhci->lock);
+ spin_unlock_irqrestore(&xhci->lock, flags);
return IRQ_HANDLED;
}
@@ -2707,7 +2708,7 @@ hw_died:
temp_64 |= ERST_EHB;
xhci_write_64(xhci, temp_64, &xhci->ir_set->erst_dequeue);
- spin_unlock(&xhci->lock);
+ spin_unlock_irqrestore(&xhci->lock, flags);
return IRQ_HANDLED;
}
--
To unsubscribe from this list: send the line "unsubscribe linux-usb" in
the body of a message to [email protected]
More majordomo info at http://vger.kernel.org/majordomo-info.html