On Mon, 23 Jun 2003, Duncan Sands wrote: > On Sunday 22 June 2003 02:00, Johannes Erdfelt wrote: > > On Sun, Jun 22, 2003, Duncan Sands <[EMAIL PROTECTED]> wrote: > > > > > > I have tracked the problem down further. What happens is that the > > > host controller is still reading a TD when uhci_free_td is called on it. > > > uhci_free_td calls pci_pool_free, which (because slab debugging > > > is turned on) does > > > memset (vaddr, POOL_POISON_FREED, pool->size); > > > This causes the host controller to barf, doubtless because it was > > > reading that TD when it got poisoned. > > > > > > In summary: > > > (1) I get > > > e400: host controller process error. something bad happened > > > e400: host controller halted. very bad > > > when queuing multiple urbs to an endpoint (2.5 kernels). > > > (2) What happens is: > > > uhci_irq -> > > > uhci_finish_completion -> > > > uhci_finish_urb -> > > > uhci_destroy_urb_priv -> > > > uhci_free_td -> > > > pci_pool_free: > > > memset (vaddr, POOL_POISON_FREED, pool->size); > > > > > > Now to find out why that TD is still in use... > > > > Interesting. > > > > That path should mean that the URB is finished and that the HC shouldn't > > be reading the TD anymore. > > > > Does this only happen with IN transfers? Perhaps only when the read is > > short? > > > > I'll need to look at the code, but maybe we don't wait the necessary > > frame after updating the qh before we start freeing the TDs for the URB > > we just completed. > > > > Hmm, off to read the sources. > > Hi Johannes, it seems to be exactly as you suspected: short IN. > Here are some schedule dumps which show what happens. I had to > dump after uhci_finish_urb finishes (and not before, or in the middle) > due to timing problems: too much delay in the wrong place and the > process error no longer occurs.
Very interesting. The UHCI Design Guide document is _not_ very specific about what happens with short packets. It does say that if the SPD bit is set then the TD is marked inactive, the QH element ptr is not updated, and an interrupt is requested. It doesn't say what happens if the SPD bit isn't set; I'm guessing that it does the same thing but doesn't request an interrupt. One could easily argue that this behavior is a design flaw and that the QH element ptr's Terminate bit should get set. The Design Guide also doesn't say how the schedule execution works when the HC retrieves a TD marked inactive. I'll assume that it doesn't process the TD any farther but just follows the QH horizontal link ptr. It _does_ say this (table 3.2.2 bit 29 = SPD): "The behavior is undefined when this bit is set with output packets or packets outside of queues." I think it's pretty clear that this means uhci_submit_common() isn't setting SPD correctly. It should _always_ be set for queued (not iso.) IN packets regardless of URB_SHORT_NOT_OK, which is handled in uhci_result_common(). It should _never_ be set for OUT packets. Of course, that's not the cause of the problem here. So look at what happens in uhci_irq(). It calls uhci_transfer_result(), which in turn calls uhci_unlink_generic() for completed URBs. That calls uhci_delete_queued_urb(), which unlinks the URB's QH from the schedule. Then later on uhci_irq() calls uhci_finish_completion(), which in turn calls uhci_finish_urb() for each completed URB. That calls uhci_destroy_urb_priv(), which among other things deallocates the URB's TDs. Apparently there's an assumption here that once an URB is completed, the HC scheduler won't look at its TDs any more. But that's not necessarily true. Consider first that the SPD interrupt doesn't arrive until the end of the frame. The CPU won't start processing the interrupt until the HC has already begun the next frame, and if the schedule isn't very full then the HC may easily reach the URB's QH before uhci_delete_queued_urb() has managed to unlink it. With FSBR turned on, the HC may even read that QH several times before it gets unlinked. Consider second that if the URB completes because of a short read, the QH's element ptr _won't_ point to the next queued QH; it will still point to the TD that got the short packet and is now marked inactive. (That's true even if there isn't another URB queued for the same endpoint. And it's also true for URBs that complete because of an error, not just a short read.) As far as the scheduler is concerned that won't make any difference; when it sees the inactive TD it will ignore it and follow the QH link ptr anyway. But the HC still has to decode that TD! The upshot of all this is that when uhci_destroy_urb_priv() removes and frees the TD, the HC might be about to read it. Now I don't know how likely that is since there's clearly a race; the CPU would have to unlink the QH and free the TD all during the time the HC is processing the QH. But it would explain what Duncan is seeing. It's exactly as Johannes said; the driver doesn't wait for the next interrupt after unlinking the QH before freeing the TDs. Would it make more sense to store the td_list header in the uhci_qh rather than in the urb_priv? That way the TDs could be freed along with the QH in uhci_free_pending_qhs(), which does execute at the right time. Maybe there's a less invasive way to accomplish the same thing, but I can't think of one. Alan Stern ------------------------------------------------------- This SF.Net email is sponsored by: INetU Attention Web Developers & Consultants: Become An INetU Hosting Partner. Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission! INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel