Hi, Mathias Nyman <[email protected]> writes: >> Mathias Nyman <[email protected]> writes: >>> On 03.05.2016 13:30, Felipe Balbi wrote: >>>> When trying to access our last TRB, XHCI was >>>> actually reading memory outside of the TRB >>>> array/ring due to an off-by-one error. >>>> >>>> This patch fixes that error and has the side effect >>>> of also fixing some rare situations where long mass >>>> storage transfers would timeout and XHCI would reset >>>> the mass storage device before continuing. >>>> >>>> Signed-off-by: Felipe Balbi <[email protected]> >>>> --- >>>> drivers/usb/host/xhci-ring.c | 4 ++-- >>>> 1 file changed, 2 insertions(+), 2 deletions(-) >>>> >>>> diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c >>>> index 6c41dbbf9f2f..ba610a72c396 100644 >>>> --- a/drivers/usb/host/xhci-ring.c >>>> +++ b/drivers/usb/host/xhci-ring.c >>>> @@ -95,7 +95,7 @@ static bool last_trb_on_last_seg(struct xhci_hcd *xhci, >>>> struct xhci_ring *ring, >>>> struct xhci_segment *seg, union xhci_trb *trb) >>>> { >>>> if (ring == xhci->event_ring) >>>> - return (trb == &seg->trbs[TRBS_PER_SEGMENT]) && >>>> + return (trb == &seg->trbs[TRBS_PER_SEGMENT - 1]) && >>>> (seg->next == xhci->event_ring->first_seg); >>>> else >>>> return le32_to_cpu(trb->link.control) & LINK_TOGGLE; >>>> @@ -109,7 +109,7 @@ static int last_trb(struct xhci_hcd *xhci, struct >>>> xhci_ring *ring, >>>> struct xhci_segment *seg, union xhci_trb *trb) >>>> { >>>> if (ring == xhci->event_ring) >>>> - return trb == &seg->trbs[TRBS_PER_SEGMENT]; >>>> + return trb == &seg->trbs[TRBS_PER_SEGMENT - 1]; >>>> else >>>> return TRB_TYPE_LINK_LE32(trb->link.control); >>>> } >>>> >>> >>> Thanks, this needs to be fixed, but there are some changes needed to >>> inc_enq() >>> as well together with this fix. >>> Otherwise the last TRB of a event ring won't be used >> >> Any idea how a problem due to this could be triggered ? So far I can't >> get any failures. BTW, the last TRB *will* be used by the HW anyway, no? >> A truer argument would be that Linux XHCI driver might miss handling >> that event. I still can't get this to fail. It seems to me that it's >> pretty much impossible for us to miss an event. >> >> If we consider what happens from the time an IRQ fires, we will have the >> following sequence of events: >> >> MSI/MSI-X/normal IRQ asserted >> xhci_irq() >> while(xhci_handle_event(xhci) > 0); >> event = xhci->event_ring->dequeue; >> /* handle the event given its type */ >> if (update_ptrs) inc_deq(); >> ring->deq_updates++; >> do { >> if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) >> if (ring->type == EVENT && last_trb_on_last_seg()) >> ring-cycle_state ^= 1; >> ring->deq_seg = ring->deq_seg->next; >> ring->dequeue = ring->deq_seg->trbs; >> else >> ring->dequeue++ >> } while (last_trb()) >> > > event ring: > trb[0] > ... > trb[254] > trb[255] = the last trb on the ring, still used for events as event > ring has no link trb. > > One scenario where the dequeue pointer, both the actual hw one, and > the sw one is at trb[254] > > xhci hw writes an event at trb[254], event ring dequeue is at trb[254] > - interrupt, we handle the event at dequeue trb[254] > - call inc_deq(), last_trb(trb[254]) check is false so instead we increase > the dequeue. > - dequeue is now at trb[255], while(last_trb(trb[255]) will now be true, so > go back to "do{". > - last_trb(trb[255]) check is now true, so we jump to next segment. > > so now the driver thinks the dequeue pointer is at the next segment, > but hw is going to write the next event at trb[255] which the driver > is not going to notice -> drive misses one event.
Good point, but it seems to me the problem is that do {} while() loop on
inc_deq(), how can we have more than one last TRB ? Granted, that check
is different for non-EVENT rings (which checks for a link TRB instead -
also wrong check as Link might not be last).
Also, I haven't managed to trigger any failures from this but I have to
fix my commit log as the lock-up -> timeout -> Reset on the mass storage
device still happens even with $subject, albeit not as frequently.
Anyway, seems like below is also needed (maybe as patch 1 on a series of
two patches ?)
diff --git a/drivers/usb/host/xhci-ring.c b/drivers/usb/host/xhci-ring.c
index ba610a72c396..6f991b7c4249 100644
--- a/drivers/usb/host/xhci-ring.c
+++ b/drivers/usb/host/xhci-ring.c
@@ -153,24 +153,21 @@ static void inc_deq(struct xhci_hcd *xhci, struct
xhci_ring *ring)
!last_trb(xhci, ring, ring->deq_seg, ring->dequeue))
ring->num_trbs_free++;
- do {
+ ring->dequeue++;
+
+ if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) {
/*
* Update the dequeue pointer further if that was a link TRB or
* we're at the end of an event ring segment (which doesn't have
* link TRBS)
*/
- if (last_trb(xhci, ring, ring->deq_seg, ring->dequeue)) {
- if (ring->type == TYPE_EVENT &&
- last_trb_on_last_seg(xhci, ring,
- ring->deq_seg, ring->dequeue)) {
- ring->cycle_state ^= 1;
- }
- ring->deq_seg = ring->deq_seg->next;
- ring->dequeue = ring->deq_seg->trbs;
- } else {
- ring->dequeue++;
- }
- } while (last_trb(xhci, ring, ring->deq_seg, ring->dequeue));
+ if (ring->type == TYPE_EVENT && last_trb_on_last_seg(xhci, ring,
+ ring->deq_seg, ring->dequeue))
+ ring->cycle_state ^= 1;
+
+ ring->deq_seg = ring->deq_seg->next;
+ ring->dequeue = ring->deq_seg->trbs;
+ }
}
/*
@@ -207,7 +204,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring
*ring,
/* Update the dequeue pointer further if that was a link TRB or we're at
* the end of an event ring segment (which doesn't have link TRBS)
*/
- while (last_trb(xhci, ring, ring->enq_seg, next)) {
+ if (last_trb(xhci, ring, ring->enq_seg, next)) {
if (ring->type != TYPE_EVENT) {
/*
* If the caller doesn't plan on enqueueing more
@@ -218,7 +215,7 @@ static void inc_enq(struct xhci_hcd *xhci, struct xhci_ring
*ring,
* the TD at the top of the ring.
*/
if (!chain && !more_trbs_coming)
- break;
+ return;
/* If we're not dealing with 0.95 hardware or
* isoc rings on AMD 0.96 host,
>> Oh; and, BTW, a further cleanup would be:
>
> Yes, looking at this there are several cleanups needed. For example inc_enq()
> should actually not bother with event ring at all. HW is the producer in the
> event ring, xhci driver (consumer) should just take care of the dequeue
> pointer.
>
> We shouldn't mix link and last trbs helpers like we currently do, I'd prefer
> functions like:
>
> last_trb_on_seg(trb)
> last_trb_on_ring(trb, ring)
> is_link_trb(trb)
> is_cycle_toggle_link_trb(trb)
are you doing these changes ? If you are, do you mind prepending all of
these functions with xhci_ or something like that ? It really helps
grepping, ftrace, dump_stack(), etc.
--
balbi
signature.asc
Description: PGP signature
