Adam Kropelin wrote:

Alan Stern wrote:

On Sun, 12 Jun 2005, Adam Kropelin wrote:


So the basic algorithm means we should expect to see bursts of 16
control transfers once every 60 seconds, plus additional bursts
generated in response to interrupt transfers. What's happening in


That does fit the pattern in the log.  I didn't bother to count the
number of tranfers in each burst so I can't confirm the number 16,
but it would be easy enough to do so.


I did count, and the bursts are always a multiple of 16.

One thing I can't explain is occasionally I see bursts of control
transfers repeated with no intervening interrupt transfer. That implies


Maybe it would help to debug both the section of hiddev that decides to
send events and the receiving code in apcupsd (to see what's in those
notifications). Could hiddev be doing this because of some bogus content
or faulty decode of the interrupt data?


It certainly could, although in this case hid-core seems to be at fault. The same code path is followed for processing input reports regardless of whether they are a result of interrupt transfers or control transfers. That leads to an interrupt event being delivered to hiddev for control transfers...which is wrong. The problem is mitigated by the fact that reporting to hiddev is supressed if the field value has not changed, which is usually the case. If the field value does change, however, a feedback loop can be formed where apcupsd polls, more events get generated, which leads to more polling. Eventually the feedback loop is broken when the value reads back the same twice in a row.

I've attached a rather ugly hack to fix the problem in hid-core. I don't know if this alone will solve Martin's lockup, but I'm hoping it does.

Vojtech, this is a fairly gross patch. Do you (or anyone else) have any ideas on how it could be accomplished more cleanly?

--Adam

I disregarded your earlier patch, don't think it makes sense to patch too much ;-) I assumed this is what you wanted me to test first. If you want me to patch apcupsd at the time time, pls drop me a line. Reading through the previous posts, we tried already not to poll the two values which generated too short LINEV, LOTRANS reports and I can definitely say it got better when we did it, but didn't solve the problem completely. I believe one of our tests had also established already that there is no other processes reading or interfering with the usb port, at least that was my understanding. I hope we will know within the next days if this solves the problem or not. Anyhow, if there is a more elegant solution pls let me know and I try it while I still have the equipment. We do have quite a bit of voltage fluctuation and black outs every now and then but also times where power is quite stable perhaps this change of environment can explain why it takes sometimes days and sometimes only 20 minutes or so for the error to occur???? If there is any need to include kernel printk messages, pls send me a patch.

Martin



-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
linux-usb-devel@lists.sourceforge.net
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to