On 10/09/13(Tue) 07:15, RD Thrush wrote:
> On 09/10/13 04:42, Martin Pieuchot wrote:
> > [...]
> >
> > Thanks for this detailed bug report.
> > 
> > You're saying that you have 2 amd64 systems with the same problem but
> > I see only the dmesg for one machine, does the other has the same ehci
> > controller?
> 
> Apparently one is ATI and the other Intel.  
> <http://arp.thrush.com/openbsd/ehci_idone/v1/> has two console captures, 
> "v1.1" and "v1.2", for the other machine after an ehci_idone hang (I hadn't 
> made the panic patch yet).  I was able to generate a ddb interrupt to stop 
> the spew and gather some additional ddb info.  The forementioned directory 
> also has acpidump, pcidump, biosdecode, and dmidecode previously collected 
> from the same kernel.
> 
> If you want/need further info about the 'v1' machine, let me know and I'll 
> boot OpenBSD and get the info.

It would be nice if you could reproduce the manipulation you did with
the other machine and set ehcidebug to 5 before switching your kvm.

> > The problem you are seeing is related to the way ehci transfers are
> > aborted.  The abortion process is subtly broken.
> > 
> > For the archives what happens in your case is that the timeout for
> > one of the transfers fires and enqueue an abort task (ehci_timeout
> > in your log).  This abort task get scheduled tries to deactivate
> > the qTDs, asks for an Interrupt on Async Advance Doorbell and goes
> > to sleep (ehci_sync_hc in your log).
> > Then the interrupt happens (ehci_intr1: door bell), wakeups the
> > task and goes into the softinterrupt path to process the finished
> > transfers.  Here the driver discovers that the transfer that timed
> > out is finished (whoa!) and tries to handles it.  But since this 
> > transfer has been marked as TIMEOUT (ehci_idone: aborted in your
> > log), it does nothing and bails.  
> > 
> > Apparently the abort task never get rescheduled and your transfer
> > is never removed from the list, certainly because the hardware 
> > keeps interrupting your systems, so you're livelock ;)
> > 
> > But all of that happens because a timeout fires for one of your
> > transfers, apparently some ATI controllers needs one more quirk,
> > as your problem looks like a dropped interrupt.  Does the diff
> > below helps?
> 
> Thank you very much for the detailed analysis and patch.  I'll build a 
> -current kernel and try it.
> 
> Would there be a complementary patch for the (above) Intel ehci controller?

I'm not even sure this will avoid your problem, a proper fix would be to
stop trying to deactivate the transfer descriptors, as it obviously
doesn't work, and just remove them from the list.  Does anybody want to
take the time to do that? :)

Otherwise you can just buy a non crappy kvm ;)

M.

Reply via email to