Re: More data on 7.2-RELEASE "hangs"

Marc G. Fournier Wed, 13 May 2009 10:45:49 -0700

On Wed, 13 May 2009, John Baldwin wrote:

Well, you had a whole lot of page faults and other VM activity, plus 500k
syscalls.  The 'w' is a count of swapped processes, so basically your box is
swapping a whole lot it seems.  I think your box is just overloaded.


I knew I was going to regret posting that :(

What I posted was what vmstat 5 shows after the issue *starts*, not whatit normally looks like ... right now, after 10 hours of uptime, and allthe same processes running, it looks like:


io# vmstat 5 (10 hours uptime now)
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy 
id
 0 1 0  10477M   301M  3503  13   1   2  3620 286   0   0  331 45491 4566 26  8 
66
 0 1 0  10430M   305M   278   7   0   0   550   0  18   0  186 19243 2917 4  3 
93
 1 1 0  10474M   295M   511   0   0   0   359   0  91   0  253 11632 3516 7  3 
90
 0 1 0  10447M   310M   819   3   0   0  1473   0  14   0  143 29575 2486 8  3 
89
 0 1 0  10558M   295M  5008  18  13   5  4128   0 121   0  345 24212 4215 16  7 
77

Right now, IO is running ~775 processes ... at the time of the vmstat Iprovided earlier, it was up to 1400 processes ... since there is only 5minutes between script runs, something is causing it to go from zero swap-> high swap within a very short period of time, but since things getbadly locked up when it happens, I can't isolate where ...


I've got the following two ps outputs at the time of the high paging:

/bin/ps -aucxHl -O jid > ps-long.out
/bin/ps -aux -O jid > ps-short.out

Is there anything in there that I could look at as far as what is puttingthings over the edge?


====

As to the 'overloaded server', here is another server, with more runningon it, but exact same configuration:


neptune# vmstat 5 (3 days, 18 hours uptime now)
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre   flt  re  pi  po    fr  sr da0 pa0   in   sy   cs us sy 
id
 0 0 0  12521M   303M  3969  15   5   3  2271 1603   0   0  444 6491 5165 37 19 
44
 0 0 0  12464M   309M  3009   1   0  15  2833   0 104   0  296 9378 3689  7  5 
88
23 0 0  12476M   297M  3845   3   0   0  2627   0  31   0  279 10545 2986 14  5 
81
 0 1 0  12530M   266M  5259   0   1   0  2551   0 145   0  432 18070 4133 45  8 
47
 1 0 0  12587M   237M  7049   0   1   0  4484   0 171   0  357 15953 4715 29  7 
64

So, normally these servers purr ... and are highly responsive ...

In fact, here is an older 32bit server, less RAM, run about 50% moreprocesses then neptune:


mercury# vmstat 5
 procs      memory      page                    disks     faults         cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr da0 pa0   in   sy  cs us sy id
 3 14 1   6817M   114M  641   7   3   1 1036 386   0   0 1109  464 157  5  5 90
 0 8 0   6817M   224M  596  33   0   5 5667 3850  86   0 1303 5768 3885  6 7 87
 1 10 0   6824M   220M 4332  32   2   0 3228   0  17   0  755 9689 3057  8 7 85
 0 9 0   6798M   219M  430   0   0   0 712   0  12   0 1274 4276 3877  2  2 95
 0 11 0   6830M   205M 1026   4   1   3 481   0  84   0 1503 5586 4370  6 4 89



----
Marc G. Fournier           Hub.Org Networking Services (http://www.hub.org)
Email . [email protected]                              MSN . [email protected]
Yahoo . yscrappy               Skype: hub.org        ICQ . 7615664
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[email protected]"

Re: More data on 7.2-RELEASE "hangs"

Reply via email to