Although you say it's a compute not an io server, those figures
initially scream IO problems... all that time in sys and a high load
average usually indicate that. Have you got a poorly configured database
on there by any chance?  

Context switches and interrupts are pretty high... could be the leap
second screwed up locking if you're running multithreaded apps, although
I don't think there's been one for a few months now? 

What info do you have in /proc/interupts - should give an idea of what
resource is being hit? Does (h)top identify the processes causing the
problem, strace it?

Happy to have a look if you wish... it's sort of what I do for a living.

Cheers,

Steve
( the real WTF is fedora on a server (: ) 

On Fri, 2012-10-26 at 10:40 +1300, Peter Glassenbury(UoC) wrote:
> Hi all,
> 
> I have had a few years sorting out hardware performance on
> some of our loaded linux compute servers. This one has me perplexed..
> It is not too much of a worry but I would like to know why
> this is happening .. and if I should be worried :-)
> 
> Machine (Fedora 16) is not running at peak performance but has done
> so previous to last week  or so.. So there is possibly something about the 
> mix of the current
> jobs that is causing it.
> 
> top - 10:13:09 up 164 days,  1:40, 13 users,  load average: 12.31, 14.60, 
> 15.11
> 
> Tasks: 305 total,   3 running, 299 sleeping,   3 stopped,   0 zombie
> 
> Cpu(s): 14.4%us, 22.2%sy,  0.0%ni, 59.9%id,  0.0%wa,  2.0%hi,  1.4%si,  0.0%st
> 
> Mem:  65979204k total, 37972660k used, 28006544k free,   617516k buffers
> 
> Swap: 16777212k total,   101216k used, 16675996k free, 32201980k cached
> 
> So heaps of memory, swap. CPU's are sitting idle a lot of the time
> (They shouldn't be on this compute server)
> nfsiostat and iostat show next to nothing happening
> (its a compute server not an io server)
> 
> Vmstat has the weird bit that I haven't seen before. system interrupts and 
> context switches are 
> through the roof for anything I have seen.
> 
> $ vmstat 1
> procs -----------memory--------------  ---swap-- -----io---- --system--  
> -----cpu-----
>   r  b   swpd   free    buff     cache   si   so    bi bo   in   cs     us sy 
> id wa st
> 
>   2  0 101216 28003040 617520 32203196   0    0     0 68   68295 71578 16 25 
> 59  0  0
> 12  0 101216 28004640 617520 32203196   0    0     0 0   71740 72877 14 24 62 
>  0  0
>   6  0 101216 28001972 617520 32203196   0    0     0 0   70366 72381 14 25 
> 61  0  0
>   4  0 101216 27997920 617520 32203196   0    0     0 0   67163 68348 13 25 
> 62  0  0
> 
> Googling found something about leap seconds and restarting ntp.. which I have 
> done.
> Anyone have ideas or suggestions on what to look at ?
> I would prefer not to do the "three finger salute" on this machine
> as some jobs have been running for weeks.
> 
> Cheers
> Pete
> 

-- 
Steve Holdoway BSc(Hons) MIITP 
http://www.greengecko.co.nz
MSN: [email protected]
Skype: sholdowa

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
Linux-users mailing list
[email protected]
http://lists.canterbury.ac.nz/mailman/listinfo/linux-users

Reply via email to