Although you say it's a compute not an io server, those figures initially scream IO problems... all that time in sys and a high load average usually indicate that. Have you got a poorly configured database on there by any chance?
Context switches and interrupts are pretty high... could be the leap second screwed up locking if you're running multithreaded apps, although I don't think there's been one for a few months now? What info do you have in /proc/interupts - should give an idea of what resource is being hit? Does (h)top identify the processes causing the problem, strace it? Happy to have a look if you wish... it's sort of what I do for a living. Cheers, Steve ( the real WTF is fedora on a server (: ) On Fri, 2012-10-26 at 10:40 +1300, Peter Glassenbury(UoC) wrote: > Hi all, > > I have had a few years sorting out hardware performance on > some of our loaded linux compute servers. This one has me perplexed.. > It is not too much of a worry but I would like to know why > this is happening .. and if I should be worried :-) > > Machine (Fedora 16) is not running at peak performance but has done > so previous to last week or so.. So there is possibly something about the > mix of the current > jobs that is causing it. > > top - 10:13:09 up 164 days, 1:40, 13 users, load average: 12.31, 14.60, > 15.11 > > Tasks: 305 total, 3 running, 299 sleeping, 3 stopped, 0 zombie > > Cpu(s): 14.4%us, 22.2%sy, 0.0%ni, 59.9%id, 0.0%wa, 2.0%hi, 1.4%si, 0.0%st > > Mem: 65979204k total, 37972660k used, 28006544k free, 617516k buffers > > Swap: 16777212k total, 101216k used, 16675996k free, 32201980k cached > > So heaps of memory, swap. CPU's are sitting idle a lot of the time > (They shouldn't be on this compute server) > nfsiostat and iostat show next to nothing happening > (its a compute server not an io server) > > Vmstat has the weird bit that I haven't seen before. system interrupts and > context switches are > through the roof for anything I have seen. > > $ vmstat 1 > procs -----------memory-------------- ---swap-- -----io---- --system-- > -----cpu----- > r b swpd free buff cache si so bi bo in cs us sy > id wa st > > 2 0 101216 28003040 617520 32203196 0 0 0 68 68295 71578 16 25 > 59 0 0 > 12 0 101216 28004640 617520 32203196 0 0 0 0 71740 72877 14 24 62 > 0 0 > 6 0 101216 28001972 617520 32203196 0 0 0 0 70366 72381 14 25 > 61 0 0 > 4 0 101216 27997920 617520 32203196 0 0 0 0 67163 68348 13 25 > 62 0 0 > > Googling found something about leap seconds and restarting ntp.. which I have > done. > Anyone have ideas or suggestions on what to look at ? > I would prefer not to do the "three finger salute" on this machine > as some jobs have been running for weeks. > > Cheers > Pete > -- Steve Holdoway BSc(Hons) MIITP http://www.greengecko.co.nz MSN: [email protected] Skype: sholdowa
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ Linux-users mailing list [email protected] http://lists.canterbury.ac.nz/mailman/listinfo/linux-users
