David Brownell wrote:
> 
> 
> Kevin Brosius wrote:
> >
> > After some testing with the ohci driver, I'm starting to suspect this is
> > a documented problem in the AMD766 controller on the motherboard.
> > Referring to
> > http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_873_1130,00.html
> > and their "AMD-766 Peripheral Bus Controller Data Sheet", I am
> > suspicious of Erratum Number 20.  To my understanding this describes a
> > case where the HccaDoneHead may not be updated at the end of a frame.
> > The Erratum fails to mention if the WDH (OHCI_INTR_WDH) interrupt will
> > still occur or not.
> 
> You mean the "Revision Guide", which is where the errata are listed.
> Right, erratum 18 doesn't look to be much trouble, unlike erratum 20.
> 
> Isn't this the MP chipset where most mobo vendors said "USB doesn't
> work", and shipped add-in cards (like NEC EHCI) so customers would
> have a working solution, when they needed it?
> 
> > I've spent some time instrumenting ohci-hcd.c and ohci-q.c in
> > drivers/usb/host, and it looks like I do not get a WDH interrupt when I
> > see the failure.  I'm not sure I've proved this conclusively, as it's
> > possible the WDH interrupt occurs, but the ohci->hcca->done_head
> > register is 0.  The Erratum suggests the pointer will be lost and never
> > written into DoneHead.  I'd guess that means the register will be 0 from
> > the last time the driver cleared it, and the HC did never update it.
> 
> Sounds about right.  Except that the HCCA is main memory, and WDH will
> never happen unless there was a non-null donelist head to write.  (There
> is a real register, which HCCA doesn't exactly shadow.)
> 
> Basically what happens is that all recent work the HC did, and tracked
> in ohci->regs->donehead, gets lost forever when the controller bungles
> writing that into ohci->hcca->done_head (sometimes, if SOF is pending).
> 

Well, further testing seems to rule out the erratum.  I don't see any
SOF's, ever, in my case (unless I enable them all the time.)  I should
see one at the same time, or following, the problem WDH if the erratum
is the problem, right?


> > I don't see any straightforward ways to workaround this problem.  Am I
> > correct in my understanding that the ohci driver will lock in the case
> > of a missed WDH interrupt w/lost DoneHead pointer?  It looks to me like
> > the driver relies on the DoneHead coming back from the HC successfully,
> > and cannot recover without it (dl_reverse_done_list() in ohci-hcd.c).
> 
> Since hcca->done_head is documented as the primary way completed TDs
> get returned to the driver, and it's the only one the OHCI driver
> currently uses, yes that'll cause much trouble.
> 
> Given the details in erratum 20, I can imagine trying to work around
> (and detect!) it, using a quirk flag to enable logic in the irq handler
> so it goes something like:
> 
> - nothing extra to do if SOF isn't enabled, else:
> 
> - read ohci->regs->donehead before clearing WDH status;
>    nothing to do if it's zero.  (There'd still be a race
>    when a TD completes quickly enough, though it wouldn't
>    be all that common at USB 1.1 speeds.)
> 
> - process the donelist. (gives the hardware time to
>    bungle the write, if it's going to do so).
> 
> - use that saved donehead value to determine if the
>    write got bungled:  current ohci->regs->donehead is
>    zero, but hcca->done_head is too.  (there'd be a few
>    other cases too, but that's the obvious one.)
> 
> - if it bungled, recover.  maybe just by treating that
>    saved value as a new donelist, reversing it etc;
> 
> Though I'm not sure how this erratum would cause one
> of your symptoms:  no more WDH interrupts.  Maybe it
> ties in to erratum 18, and you'd need to write the
> normally read-only regs->donehead.  Or maybe that's
> just because your device wasn't resubmitting; does
> something like "lsusb" fail too?
> 

Yes, lsusb fails also (see prior mail.)

I'm pursuing why the WDH's stop.  I've added some code to indicate depth
of the queue (ohci->periodic[] in my case for ISOC) and to mark entries
in
it against the donelist.  I started this path based on the possible loss
of WDH's from the erratum, and to see of the queue was empty or full. 
Here a snippet of what I see:

Jan 11 19:32:08 sea kernel: drivers/usb/host/ohci-dbg.c: SUB dfce95c0
dev:2,ep=3
-I,ISOC,flags:2,len:0/15,stat:-115
Jan 11 19:32:08 sea kernel: dl done dbcbf040, dbcbf040
Jan 11 19:32:08 sea kernel: dl done Aa>xxxxx..... k
Jan 11 19:32:08 sea kernel: dl done_head 1bcf0201, dbcf0200
Jan 11 19:32:08 sea kernel: drivers/usb/host/ohci-dbg.c: RET dfce9680
dev:2,ep=2
-O,ISOC,flags:2,len:768/1215,stat:0
Jan 11 19:32:08 sea kernel: out_cmpl B1
Jan 11 19:32:08 sea kernel: drivers/usb/host/ohci-dbg.c: SUB dfce9680
dev:2,ep=2
-O,ISOC,flags:2,len:0/1215,stat:-115
Jan 11 19:32:08 sea kernel: drivers/usb/host/ohci-dbg.c: data(0/1215):
30 06 eb 
0d a9 08 dc 14 ed 09 79 16 59 07 97 0c... stat:-115
Jan 11 19:32:08 sea kernel: dl done dbcbf040, dbcbf040
Jan 11 19:32:08 sea kernel: dl done Aa>xxxxx..... k


The 'dl done' entries are for done_list processing.  A line like:

dl done Aa>xxxxx..... k

shows queue depth. A for periodic[0], 'x' for an entry in the periodic
list and the done_list, '.' for a entry only in the periodic list. 
(These are TD pointer comparisons.)  The length of the bar (x and . is
the total depth of the periodic chain during the current WDH, looks like
10 entries in this case.)  The 'k' on the end indicates number of
done_list entries (11 in this case.)

Hmm, I still need to understand this better, I'm not sure if the above
makes sense or not...


> > With my limited knowledge of the driver, it seems possible to enable the
> > SOF interrupts and then maybe keep the done list up to date at that
> > time.  That seems like an excessive penalty to pay in additional
> > interrupts though.
> 
> SOF is what causes the trouble ... you want to avoid using it more.
> 

Testing with SOF enabled for every frame doesn't seem to make the
problem any worse.  Which fits my note above about not getting any SOF,
nor them being the case of this problem.


> > Any thoughts or suggestions on the best way to proceed?  I'm presently
> > pursuing being able to recover any way possible, probably by maintaining
> > the TD/URB list in the driver, so that I can check the list on SOF
> > interrupts and catch the missing DoneHead entry.  Is that possible?
> > This is mostly to prove the problem, rather than as a optimal fix.
> 
> See above ... that should help confirm whether this really is what you're
> seeing.  Which is certainly a good idea.  If it isn't, then it'd seem you
> are finding some new bug (might still be hardware).
> 
> The OHCI driver already maintains a TD list (ed->td_list) in the 2.5 code.
> I've even thought about changing the OHCI driver so it scans the schedule,
> much like how ehci-hcd works (or even uhci-hcd).  That schedule is the ED
> lists in ohci->ed_{control,bulk}tail and ohci->periodic[].  Basically we
> know that at most one td in ed->td_list is at ed->hwHeadP, and also that
> any TD before that has been retired by the hardware ... regardless of
> what the donelist tells us (or doesn't).
> 
> That is, the donelist mechanism is just letting us avoid scanning the
> active EDs:  it's more efficient.  But in cases where the donelist
> support touches hardware bugs (I've heard other reports of this, for
> non-PCI hardware), it's possible to find completed TDs another way.
> (Mixing the two modes would be unhealthy.)
> 
> I've avoided that change since it would destabilize things, as well
> as slow things down a bit.  Plus, unlike most of the other 2.5 changes
> hasn't (so far) been necessary to fix bugs.
> 
> - Dave

-- 
Kevin


-------------------------------------------------------
This SF.NET email is sponsored by:
SourceForge Enterprise Edition + IBM + LinuxWorld = Something 2 See!
http://www.vasoftware.com
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to