when I came in to work today, top on daedalus showed all three load
averages above 27. vmstat -w 5 showed that the number of processes in
the run queue was jumping up to over 350 fairly frequently, and there's
some evidence of spikes in CPU usage.  So I bounced us back to httpd
2_0_28, and the bad behavior has definately gotten better.

Either we're doing something to make more processes run-able at times,
or httpd related processes are staying in the run queue longer than they
used to sometimes.  I think it's the latter.  For one thing, the idle
CPU drops down to zero periodically, which I've never seen with 2_0_28.  

Jeff & I discussed how to troubleshoot this baby.  Some thoughts:
* compare trusses and look for abnormalities
* scrutinize the error logs
* calculate and log how much CPU time each request takes, the thought
being that some requests are burning a lot more CPU than they used to.  

Any other ideas?  

If this isn't a showstopper, it's close.

Greg
-----------------------------------------------------------------------------
(with 2.0.30-dev)

[gregames@daedalus gregames]$ top
last pid:  3248;  load averages: 39.32, 32.01, 27.17    up 4+16:31:26 
06:49:45
552 processes: 8 running, 541 sleeping, 3 zombie
CPU states: 18.1% user,  0.0% nice, 23.8% system,  4.2% interrupt, 53.8%
idle
[...]
[gregames@daedalus gregames]$ top
last pid:  3517;  load averages: 14.83, 26.29, 25.36    up 4+16:32:21 
06:50:40
467 processes: 255 running, 210 sleeping, 2 zombie
CPU states:  0.8% user,  0.0% nice, 11.6% system,  2.5% interrupt, 85.1%
idle
[...]
[gregames@daedalus gregames]$ vmstat -w 5

(first column is the size of the run queue; last column is idle CPU)

 procs      memory      page                    disks     faults     
cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr da0 da1   in   sy  cs us
sy id
21 3 0  360208  83460  309   2   1   0 329  64   0   0  675 2543 1480 
4  5 90
65 3 0  355192  84776  564   2   1   0 618   0  49   7 1238 4590 7486  9
19 73
72 3 0  350932  84324  594   6   2   0 607   0  75  10 1392 5040 8431 10
20 70
53 4 0  347908  83620  682   4   2   0 660   0  53   8 1280 5072 8647 10
20 69
360 3 0  346000  82164 1336   3   4   0 1162   0  80  11 1415 8164 8891
26 27 47
19 4 0  346792  80344  606   1   2   0 559   0  14  10 1283 4140 8060 11
19 71
18 3 0  337024  79208  498   9   2   0 596   0  15  21 1598 4979 6531 11
20 70
12 4 0  343628  71740  818   4   2   0 546   0  63  13 1449 5099 6503  9
18 73
26 3 0  352104  65332 1322   5   3   0 930   0  66  14 1340 8979 7786 11
25 64
25 5 0  387428  46428 5383   4   2   0 4397   0  16   9 1876 27473 15659
29 71 0
40 6 0  404196  36828 5136   2   2   0 3927   0  46  12 2050 25730 15231
28 72 0
26 3 0  390944  67624 3763   3   3   0 3513 1365  53  12 2042 21203
15966 22 69 9
388 6 0  354580  82856  572   2   2   0 1325   0  28   6 1305 7314
10551  9 28 62
356 4 0  341624  86328 1247   2   1   0 1369   0  70   8 1225 4977 6641
20 20 60
12 3 0  338044  86392  384   8   3   0 384   0   1  10 1028 3300 6861 14
15 71
341 3 0  336016  85280   24   9   3   0  48   0   0   8 1010 2983 6120 
4 14 83
18 3 0  327908  86340  660   5   3   0 696   0  11   9 1112 3515 6304 13
16 71
320 3 0  322972  85720  522   2   2   0 512   0   3  14 1083 3043 5146
14 12 74
 8 3 0  323028  83276  255   3   2   0 211   0  12  11 1112 2712 4367  4
11 85
 
-----------------------------------------------------------------------------
(with 2_0_28)

[gregames@daedalus gregames]$ top
last pid: 43953;  load averages:  0.48,  0.62,  0.81      up 4+21:09:23 
11:27:42
258 processes: 1 running, 257 sleeping
CPU states:  2.8% user,  0.0% nice,  5.1% system,  2.4% interrupt, 89.8%
idle

[gregames@daedalus gregames]$ vmstat -w 5
 procs      memory      page                    disks     faults     
cpu
 r b w     avm    fre  flt  re  pi  po  fr  sr da0 da1   in   sy  cs us
sy id
 8 3 0  347520  56464  324   2   1   0 342  65   0   0  693 2645 1499 
4  6 90
63 3 0  341884  57496  356   6   0   0 403   0   4   8  754 2451 1372 
8  5 87
19 3 0  341808  56672  386   0   1   0 353   0   7  10  886 2829 1273 
8  7 86
35 3 0  351424  53896  219   3   2   0 122   0   0  14 1023 2481 1033 
4  8 88
 7 3 0  351504  52840   31   1   1   0   1   0   1   4  916 2326 1180 
2  5 93
68 3 0  431392  39536 4274   1   2   0 3315 1363  61   5 1504 19223 4338
21 46 34
16 3 0  382092  62120  681   1   3   0 1806   0  10   4 1422 8630 2455 
5 22 73
13 3 0  371612  66116  700   9   6   0 822   0  11   9  988 4534 1637 10
10 80
45 3 0  360188  67256  449   2   3   0 480   0  20   5 1024 4424 1347 
6  9 85
11 3 0  357000  66488 1809   1   1   0 1664   0   7   8  864 18482 1276 
4 17 79
16 3 0  356388  65428  301   1   1   0 276   0   9   8 1140 3559 1490
10  9 81
15 3 0  353324  62920  139   1   1   0 138   0  36  11 1069 3355 1545 
4  6 89
 9 4 0  351680  60328  665   3   1   0 550   0  25  14 1123 4700 1474 12
10 79
35 3 0  350108  59352  930   3   1   0 865   0  27   7 1047 7593 1224 10
10 81
65 3 0  355612  57188  907   1   1   0 762   0  52   6 1018 4586 1515 10
12 78
57 3 0  353360  56660  445   0   1   0 451   0  33   7  909 3744 1436 
7  9 84
23 3 0  355708  53164  379   2   1   0 256   0  45   6  946 3333 1487 
6  7 88
10 4 0  369888  51636  586   2   1   0 535   0  54   6 1066 4817 1174  7
11 82
30 3 0  369504  49556  258   4   1   0 212   0  20   8  920 3304 1189 
5  7 88
-------- Original Message --------
Subject: upgrade to FreeBSD 4.5-PRERELEASE
Date: Fri, 28 Dec 2001 14:49:56 -0800 (PST)
From: Brian Behlendorf <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]


I've upgraded icarus and daedalus to the most recent cut of the
"stable" branch of FreeBSD, which is currently named "4.5-PRERELEASE" as
the 4.5 release is imminent.  There have been lots of improvements in
performance and stability with this release, and it's good to keep
current
anyways.

I've noticed, btw, an occurance of load spiking on daedalus in the last
week - where the load jumps up to 30 or so for a few minutes then back
down.  I get a page whenever the 10-minute load average is above 8, and
when I get that page I also get a quick "top" output, but by the time I
get that notification there's no clear process causing that load.  So
I've
been getting 10-20 pages per day on my phone due to the load, without a
way to tell what's been causing it.  The only thing I can think of that
has changed significantly over the last week was a newer httpd being
installed.  Has anyone else seen this from recent httpd 2.0 releases?

        Brian

Reply via email to