On 2010/09/22 10:04, James Peltier wrote: > ----- Original Message ---- > > > From: Stuart Henderson <s...@spacehopper.org> > > To: Andre Keller <a...@list.ak.cx> > > Cc: misc@openbsd.org > > Sent: Wed, September 22, 2010 8:44:26 AM > > Subject: Re: em(4) ierrs [solved] > > > > On 2010/09/22 17:38, Andre Keller wrote: > > > Hi Stuart > > > > > > On 21.09.2010 01:28, schrieb Stuart Henderson: > > > > I would try wbng first. Failing that, lm. I doubt you would > > > > need to disable ichiic but that would be the next step if there's > > > > no improvement. > > > > > > well disabling wbng seems to be the solution. After one day of normal > > > traffic levels we do not see any Ierrs anymore... > > > > > > Thank you Stuart for the helpful advise. > > > > > > > > > Can somebody explain how this driver (which is for getting voltage > > > levels, fan speeds etc, if i did not misinterpret the manpage) is > > > causing this strange behavior? I'm just curious... > > > > Great, thanks for the feedback. > > > > If any code ties up the kernel for too long, it can't handle > > other tasks in a timely fashion. > > > > > > I, unfortunately, am still experiencing livelocks on my em interfaces on my > Dell > R200 server in bridging mode. I'm going to have to schedule an upgrade to > the > latest snapshot first to see if that clears up any issues, but barring that > I'm > not sure where to look. Perhaps I'll also try the UP kernel.
the "livelock" counter means a timeout wasn't reached in time, indicating the system being too busy to run userland. (see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c, and the video from asiabsdcon starting about 15 minutes into http://www.youtube.com/watch?v=fv-AQJqUzRI). when this happens, nics with drivers using the MCLGETI mechanism halve the size of their receive rings, so that packets drop earlier, more effectively limiting system load than if they were allowed to proceed up the network stack. so for some reason or other the timeout wasn't processed quickly enough and the system responds in this way to limit the overload. so the challenge is to identify what causes the system to become non-responsive (could be in the network stack or could be for other reasons) and work out ways to alleviate that..