On Wed, 20 Nov 2013, Russell Coker wrote:

> On Wed, 20 Nov 2013, Tim Connors <[email protected]> wrote:
> > > Does anyone know what the maximum number of context switches per core you
> > > can expect on xeon level hardware?
> > >
> > > I'm trying to claim we get overloaded when we reach a little less than
> > > 10,000 cswch/s per second, but we've lost all the historical data.
> >
> > Indeed, is there going to be a maximum for a given piece of hardware (eg,
> > maximum amount of interrupts that can be generated per second; time spent
> > in the interrupt handler that all has to be handled by only one CPU
> > hence explaining why CPU system usage never looks alarming (divide by 8
> > on some servers, by 16 on others); big kernel lock somewhere in the
> > context switch code)?
> >
> > When we have these overloads, nothing else we measure seems to be
> > approaching any limit.  The servers have plenty of CPU left, and there's
> > no real difficulty logging into them.  Anything else I should be looking
> > at?  Fork rate is tiny (1 or 2 per second).  Network bandwidth is fine.
> > Not sure that I've noticed network packet limitations (4k packets per
> > second per host when it failed last time, generating 16000
> > interrupts/second total per host).
>
> What is going wrong in the "overload"?

Something hits a tipping point, the number of apache worker slots
(3000-6000 depending on hardware specs) rapidly fills up, then apache
stops accepting new connections and www.bom.gov.au goes dark (since this
happens on all machines in the load balanced cluster simultaneously).
woops!

> Why not just write a context switch benchmark?  It should be simple to have a
> 50+ pairs of processes and for each pair have them send a byte to a pipe and
> then wait to receive a byte from another pipe.
>
> http://manpages.ubuntu.com/manpages/hardy/lat_ctx.8.html
>
> From a quick Google search it seems that my above idea has already been
> implemented.
>
> http://blog.tsunanet.net/2010/11/how-long-does-it-take-to-make-context.html
> https://github.com/tsuna/contextswitch
>
> The above looks interesting too.  A google search for the words context,
> switch, and benchmark will find you other things as well.

Believe me I searched.  All the snot in my head seems to be clogging up my
synapses today unfortunately.

But the blog entry looks good.  I imagine that the 140,000
cswitches/second on 16 core machines running httpd+php interpreter is
pretty much a fundamental limit on E5410 level hardware, given that apache
is heavy weight enough that it's going to be more towards the 50,000µs end
of the spectrum presented in that blog.

Now I just have to convince the powers that be that php is a stupid thing
to rely on when you don't have to, and it's obviously that recent change
that broke the system that formerly coped with many times the amount of
traffic that it now croaks on.

Now I've got some benchmarks to run.  I mean, fight some fires.

-- 
Tim Connors
_______________________________________________
luv-main mailing list
[email protected]
http://lists.luv.asn.au/listinfo/luv-main

Reply via email to