On 2010/09/22 10:04, James Peltier wrote:
> ----- Original Message ----
> 
> > From: Stuart Henderson <s...@spacehopper.org>
> > To: Andre Keller <a...@list.ak.cx>
> > Cc: misc@openbsd.org
> > Sent: Wed, September 22, 2010 8:44:26 AM
> > Subject: Re: em(4) ierrs [solved]
> > 
> > On 2010/09/22 17:38, Andre Keller wrote:
> > > Hi Stuart
> > > 
> > > On  21.09.2010 01:28, schrieb Stuart Henderson:
> > > > I would try wbng first.  Failing that, lm. I doubt you would
> > > > need to disable ichiic but that  would be the next step if there's
> > > > no improvement. 
> > > 
> > >  well disabling wbng seems to be the solution. After one day of normal
> > >  traffic levels we do not see any Ierrs anymore...
> > > 
> > > Thank you  Stuart for the helpful advise.
> > > 
> > > 
> > > Can somebody explain  how this driver (which is for getting voltage
> > > levels, fan speeds etc, if  i did not misinterpret the manpage) is
> > > causing this strange behavior?  I'm just curious...
> > 
> > Great, thanks for the feedback.
> > 
> > If any code  ties up the kernel for too long, it can't handle
> > other tasks in a timely  fashion. 
> > 
> >
> 
> I, unfortunately, am still experiencing livelocks on my em interfaces on my 
> Dell 
> R200 server in bridging mode.  I'm going to have to schedule an upgrade to 
> the 
> latest snapshot first to see if that clears up any issues, but barring that 
> I'm 
> not sure where to look.  Perhaps I'll also try the UP kernel.

the "livelock" counter means a timeout wasn't reached in time,
indicating the system being too busy to run userland.
(see m_cltick(), m_cldrop() etc in sys/kern/uipc_mbuf.c,
and the video from asiabsdcon starting about 15 minutes into
http://www.youtube.com/watch?v=fv-AQJqUzRI).

when this happens, nics with drivers using the MCLGETI mechanism
halve the size of their receive rings, so that packets drop
earlier, more effectively limiting system load than if they
were allowed to proceed up the network stack.

so for some reason or other the timeout wasn't processed
quickly enough and the system responds in this way to limit
the overload. so the challenge is to identify what causes
the system to become non-responsive (could be in the network
stack or could be for other reasons) and work out ways
to alleviate that..

Reply via email to