On Mon, 23 Jun 2003, Duncan Sands wrote:

> On Sunday 22 June 2003 02:00, Johannes Erdfelt wrote:
> > On Sun, Jun 22, 2003, Duncan Sands <[EMAIL PROTECTED]> wrote:
> > >
> > > I have tracked the problem down further.  What happens is that the
> > > host controller is still reading a TD when uhci_free_td is called on it.
> > > uhci_free_td calls pci_pool_free, which (because slab debugging
> > > is turned on) does
> > > memset (vaddr, POOL_POISON_FREED, pool->size);
> > > This causes the host controller to barf, doubtless because it was
> > > reading that TD when it got poisoned.
> > >
> > > In summary:
> > > (1) I get
> > > e400: host controller process error. something bad happened
> > > e400: host controller halted. very bad
> > > when queuing multiple urbs to an endpoint (2.5 kernels).
> > > (2) What happens is:
> > > uhci_irq ->
> > > uhci_finish_completion ->
> > > uhci_finish_urb ->
> > > uhci_destroy_urb_priv ->
> > > uhci_free_td ->
> > > pci_pool_free:
> > > memset (vaddr, POOL_POISON_FREED, pool->size);
> > >
> > > Now to find out why that TD is still in use...
> >
> > Interesting.
> >
> > That path should mean that the URB is finished and that the HC shouldn't
> > be reading the TD anymore.
> >
> > Does this only happen with IN transfers? Perhaps only when the read is
> > short?
> >
> > I'll need to look at the code, but maybe we don't wait the necessary
> > frame after updating the qh before we start freeing the TDs for the URB
> > we just completed.
> >
> > Hmm, off to read the sources.
> 
> Hi Johannes, it seems to be exactly as you suspected: short IN.
> Here are some schedule dumps which show what happens.  I had to
> dump after uhci_finish_urb finishes (and not before, or in the middle)
> due to timing problems: too much delay in the wrong place and the
> process error no longer occurs.

Very interesting.  The UHCI Design Guide document is _not_ very specific
about what happens with short packets.  It does say that if the SPD bit is
set then the TD is marked inactive, the QH element ptr is not updated, and
an interrupt is requested.  It doesn't say what happens if the SPD bit
isn't set; I'm guessing that it does the same thing but doesn't request
an interrupt.  One could easily argue that this behavior is a design flaw 
and that the QH element ptr's Terminate bit should get set.

The Design Guide also doesn't say how the schedule execution works when 
the HC retrieves a TD marked inactive.  I'll assume that it doesn't 
process the TD any farther but just follows the QH horizontal link ptr.

It _does_ say this (table 3.2.2 bit 29 = SPD): "The behavior is undefined
when this bit is set with output packets or packets outside of queues."  
I think it's pretty clear that this means uhci_submit_common() isn't
setting SPD correctly.  It should _always_ be set for queued (not iso.) IN
packets regardless of URB_SHORT_NOT_OK, which is handled in
uhci_result_common().  It should _never_ be set for OUT packets.  Of
course, that's not the cause of the problem here.


So look at what happens in uhci_irq().  It calls uhci_transfer_result(), 
which in turn calls uhci_unlink_generic() for completed URBs.  That calls 
uhci_delete_queued_urb(), which unlinks the URB's QH from the schedule.

Then later on uhci_irq() calls uhci_finish_completion(), which in turn 
calls uhci_finish_urb() for each completed URB.  That calls 
uhci_destroy_urb_priv(), which among other things deallocates the URB's 
TDs.

Apparently there's an assumption here that once an URB is completed, the
HC scheduler won't look at its TDs any more.  But that's not necessarily
true.  Consider first that the SPD interrupt doesn't arrive until the end
of the frame.  The CPU won't start processing the interrupt until the HC
has already begun the next frame, and if the schedule isn't very full then
the HC may easily reach the URB's QH before uhci_delete_queued_urb() has
managed to unlink it.  With FSBR turned on, the HC may even read that QH
several times before it gets unlinked.

Consider second that if the URB completes because of a short read, the
QH's element ptr _won't_ point to the next queued QH; it will still point
to the TD that got the short packet and is now marked inactive.  (That's
true even if there isn't another URB queued for the same endpoint.  And
it's also true for URBs that complete because of an error, not just a
short read.)  As far as the scheduler is concerned that won't make any
difference; when it sees the inactive TD it will ignore it and follow the
QH link ptr anyway.  But the HC still has to decode that TD!

The upshot of all this is that when uhci_destroy_urb_priv() removes and
frees the TD, the HC might be about to read it.  Now I don't know how
likely that is since there's clearly a race; the CPU would have to unlink
the QH and free the TD all during the time the HC is processing the QH.  
But it would explain what Duncan is seeing.

It's exactly as Johannes said; the driver doesn't wait for the next
interrupt after unlinking the QH before freeing the TDs.

Would it make more sense to store the td_list header in the uhci_qh
rather than in the urb_priv?  That way the TDs could be freed along with 
the QH in uhci_free_pending_qhs(), which does execute at the right time.  
Maybe there's a less invasive way to accomplish the same thing, but I 
can't think of one.

Alan Stern



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to