David Brownell wrote:
> 
> 
> Kevin Brosius wrote:
> >
> > After some testing with the ohci driver, I'm starting to suspect this is
> > a documented problem in the AMD766 controller on the motherboard.
> > Referring to
> > http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_873_1130,00.html
> > and their "AMD-766 Peripheral Bus Controller Data Sheet", I am
> > suspicious of Erratum Number 20.  To my understanding this describes a
> > case where the HccaDoneHead may not be updated at the end of a frame.
> > The Erratum fails to mention if the WDH (OHCI_INTR_WDH) interrupt will
> > still occur or not.
> 
> You mean the "Revision Guide", which is where the errata are listed.
> Right, erratum 18 doesn't look to be much trouble, unlike erratum 20.

Yes, sorry.

> 
> Isn't this the MP chipset where most mobo vendors said "USB doesn't
> work", and shipped add-in cards (like NEC EHCI) so customers would
> have a working solution, when they needed it?
> 

I don't think so.  I believe that was the 760MPX, rather than the 760MP
I'm working with.  I just checked, and AMD's errata bears this out. 
Take a look at
http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_873_4296,00.html
and the Revision guide for the 768 chip.  You'll find Errata 26, "USB
Controller may cause secondary PCI bus contention".  The suggested
workaround is disabling the on board USB controller and supplying a plug
in card.  I'm glad you asked, because I hadn't followed up on checking
this yet.

Another point of interest, testing under Windows XP showed the USB audio
to be stable, so I figured we ought to be able to do it on the Linux
side also.

> > I've spent some time instrumenting ohci-hcd.c and ohci-q.c in
> > drivers/usb/host, and it looks like I do not get a WDH interrupt when I
> > see the failure.  I'm not sure I've proved this conclusively, as it's
> > possible the WDH interrupt occurs, but the ohci->hcca->done_head
> > register is 0.  The Erratum suggests the pointer will be lost and never
> > written into DoneHead.  I'd guess that means the register will be 0 from
> > the last time the driver cleared it, and the HC did never update it.
> 
> Sounds about right.  Except that the HCCA is main memory, and WDH will
> never happen unless there was a non-null donelist head to write.  (There
> is a real register, which HCCA doesn't exactly shadow.)

OK, that makes more sense.  I wasn't clear on the memory register
distinction for the donehead locations.

> 
> Basically what happens is that all recent work the HC did, and tracked
> in ohci->regs->donehead, gets lost forever when the controller bungles
> writing that into ohci->hcca->done_head (sometimes, if SOF is pending).
> 
> > I don't see any straightforward ways to workaround this problem.  Am I
> > correct in my understanding that the ohci driver will lock in the case
> > of a missed WDH interrupt w/lost DoneHead pointer?  It looks to me like
> > the driver relies on the DoneHead coming back from the HC successfully,
> > and cannot recover without it (dl_reverse_done_list() in ohci-hcd.c).
> 
> Since hcca->done_head is documented as the primary way completed TDs
> get returned to the driver, and it's the only one the OHCI driver
> currently uses, yes that'll cause much trouble.
> 
> Given the details in erratum 20, I can imagine trying to work around
> (and detect!) it, using a quirk flag to enable logic in the irq handler
> so it goes something like:
> 
> - nothing extra to do if SOF isn't enabled, else:
> 
> - read ohci->regs->donehead before clearing WDH status;
>    nothing to do if it's zero.  (There'd still be a race
>    when a TD completes quickly enough, though it wouldn't
>    be all that common at USB 1.1 speeds.)
> 
> - process the donelist. (gives the hardware time to
>    bungle the write, if it's going to do so).
> 
> - use that saved donehead value to determine if the
>    write got bungled:  current ohci->regs->donehead is
>    zero, but hcca->done_head is too.  (there'd be a few
>    other cases too, but that's the obvious one.)
> 
> - if it bungled, recover.  maybe just by treating that
>    saved value as a new donelist, reversing it etc;
> 

I'll do some more testing to see if an enabled SOF happens at the same
time.  I don't think it does, so maybe that makes this a different
problem.

> Though I'm not sure how this erratum would cause one
> of your symptoms:  no more WDH interrupts.  Maybe it
> ties in to erratum 18, and you'd need to write the
> normally read-only regs->donehead.  Or maybe that's
> just because your device wasn't resubmitting; does
> something like "lsusb" fail too?
> 

Yes, after the problem has occurred.  I hadn't tried this before, but an
lsusb -v shows:

Jan  4 07:30:22 sea kernel: drivers/usb/host/ohci-dbg.c: SUB da4ac440
dev:2,ep=0-I,CTRL,flags:0,len:0/256,stat:-115
Jan  4 07:30:22 sea kernel: drivers/usb/core/message.c:
usb_control/bulk_msg: timeout
Jan  4 07:30:22 sea kernel: drivers/usb/host/ohci-dbg.c: UNLINK da4ac440
dev:2,ep=0-I,CTRL,flags:0,len:0/256,stat:-2

and hangs, printing only:
/home/cobra # lsusb -v

Bus 001 Device 002: ID 041e:3000 Creative Labs 


That takes it out of the usb audio driver domain, right?  Or is the
audio driver part of the lsusb device probe?  I'll do some more
testing...

> > With my limited knowledge of the driver, it seems possible to enable the
> > SOF interrupts and then maybe keep the done list up to date at that
> > time.  That seems like an excessive penalty to pay in additional
> > interrupts though.
> 
> SOF is what causes the trouble ... you want to avoid using it more.

Hmm.  Having enabled SOF on every frame seems to make no difference in
the failure mode here.  But I doubt I've done the testing to actually
prove it better or worse.

> 
> > Any thoughts or suggestions on the best way to proceed?  I'm presently
> > pursuing being able to recover any way possible, probably by maintaining
> > the TD/URB list in the driver, so that I can check the list on SOF
> > interrupts and catch the missing DoneHead entry.  Is that possible?
> > This is mostly to prove the problem, rather than as a optimal fix.
> 
> See above ... that should help confirm whether this really is what you're
> seeing.  Which is certainly a good idea.  If it isn't, then it'd seem you
> are finding some new bug (might still be hardware).
> 
> The OHCI driver already maintains a TD list (ed->td_list) in the 2.5 code.
> I've even thought about changing the OHCI driver so it scans the schedule,
> much like how ehci-hcd works (or even uhci-hcd).  That schedule is the ED
> lists in ohci->ed_{control,bulk}tail and ohci->periodic[].  Basically we
> know that at most one td in ed->td_list is at ed->hwHeadP, and also that
> any TD before that has been retired by the hardware ... regardless of
> what the donelist tells us (or doesn't).
> 
> That is, the donelist mechanism is just letting us avoid scanning the
> active EDs:  it's more efficient.  But in cases where the donelist
> support touches hardware bugs (I've heard other reports of this, for
> non-PCI hardware), it's possible to find completed TDs another way.
> (Mixing the two modes would be unhealthy.)

That's helpful information.  I hadn't gotten familiar enough with the
driver to realize the TD list was still available locally.  But that
will help if we need to verify the donelist.  I'll do some more testing.

> 
> I've avoided that change since it would destabilize things, as well
> as slow things down a bit.  Plus, unlike most of the other 2.5 changes
> hasn't (so far) been necessary to fix bugs.
> 
> - Dave

-- 
Kevin


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to