On Wed, 13 May 2009, John Baldwin wrote:

Well, you had a whole lot of page faults and other VM activity, plus 500k
syscalls.  The 'w' is a count of swapped processes, so basically your box is
swapping a whole lot it seems.  I think your box is just overloaded.

I knew I was going to regret posting that :(

What I posted was what vmstat 5 shows after the issue *starts*, not what it normally looks like ... right now, after 10 hours of uptime, and all the same processes running, it looks like:

io# vmstat 5 (10 hours uptime now)
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy 
id
 0 1 0  10477M   301M  3503  13   1   2  3620 286   0   0  331 45491 4566 26  8 
66
 0 1 0  10430M   305M   278   7   0   0   550   0  18   0  186 19243 2917 4  3 
93
 1 1 0  10474M   295M   511   0   0   0   359   0  91   0  253 11632 3516 7  3 
90
 0 1 0  10447M   310M   819   3   0   0  1473   0  14   0  143 29575 2486 8  3 
89
 0 1 0  10558M   295M  5008  18  13   5  4128   0 121   0  345 24212 4215 16  7 
77

Right now, IO is running ~775 processes ... at the time of the vmstat I provided earlier, it was up to 1400 processes ... since there is only 5 minutes between script runs, something is causing it to go from zero swap -> high swap within a very short period of time, but since things get badly locked up when it happens, I can't isolate where ...

I've got the following two ps outputs at the time of the high paging:

/bin/ps -aucxHl -O jid > ps-long.out
/bin/ps -aux -O jid > ps-short.out

Is there anything in there that I could look at as far as what is putting things over the edge?

====

As to the 'overloaded server', here is another server, with more running on it, but exact same configuration:

neptune# vmstat 5 (3 days, 18 hours uptime now)
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy 
id
 0 0 0  12521M   303M  3969  15   5   3  2271 1603   0   0  444 6491 5165 37 19 
44
 0 0 0  12464M   309M  3009   1   0  15  2833   0 104   0  296 9378 3689  7  5 
88
23 0 0  12476M   297M  3845   3   0   0  2627   0  31   0  279 10545 2986 14  5 
81
 0 1 0  12530M   266M  5259   0   1   0  2551   0 145   0  432 18070 4133 45  8 
47
 1 0 0  12587M   237M  7049   0   1   0  4484   0 171   0  357 15953 4715 29  7 
64

So, normally these servers purr ... and are highly responsive ...

In fact, here is an older 32bit server, less RAM, run about 50% more processes then neptune:

mercury# vmstat 5
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs us sy id
 3 14 1   6817M   114M  641   7   3   1 1036 386   0   0 1109  464 157  5  5 90
 0 8 0   6817M   224M  596  33   0   5 5667 3850  86   0 1303 5768 3885  6 7 87
 1 10 0   6824M   220M 4332  32   2   0 3228   0  17   0  755 9689 3057  8 7 85
 0 9 0   6798M   219M  430   0   0   0 712   0  12   0 1274 4276 3877  2  2 95
 0 11 0   6830M   205M 1026   4   1   3 481   0  84   0 1503 5586 4370  6 4 89



----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . [email protected]                              MSN . [email protected]
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Reply via email to