when I came in to work today, top on daedalus showed all three load averages above 27. vmstat -w 5 showed that the number of processes in the run queue was jumping up to over 350 fairly frequently, and there's some evidence of spikes in CPU usage. So I bounced us back to httpd 2_0_28, and the bad behavior has definately gotten better.
Either we're doing something to make more processes run-able at times, or httpd related processes are staying in the run queue longer than they used to sometimes. I think it's the latter. For one thing, the idle CPU drops down to zero periodically, which I've never seen with 2_0_28. Jeff & I discussed how to troubleshoot this baby. Some thoughts: * compare trusses and look for abnormalities * scrutinize the error logs * calculate and log how much CPU time each request takes, the thought being that some requests are burning a lot more CPU than they used to. Any other ideas? If this isn't a showstopper, it's close. Greg ----------------------------------------------------------------------------- (with 2.0.30-dev) [gregames@daedalus gregames]$ top last pid: 3248; load averages: 39.32, 32.01, 27.17 up 4+16:31:26 06:49:45 552 processes: 8 running, 541 sleeping, 3 zombie CPU states: 18.1% user, 0.0% nice, 23.8% system, 4.2% interrupt, 53.8% idle [...] [gregames@daedalus gregames]$ top last pid: 3517; load averages: 14.83, 26.29, 25.36 up 4+16:32:21 06:50:40 467 processes: 255 running, 210 sleeping, 2 zombie CPU states: 0.8% user, 0.0% nice, 11.6% system, 2.5% interrupt, 85.1% idle [...] [gregames@daedalus gregames]$ vmstat -w 5 (first column is the size of the run queue; last column is idle CPU) procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 21 3 0 360208 83460 309 2 1 0 329 64 0 0 675 2543 1480 4 5 90 65 3 0 355192 84776 564 2 1 0 618 0 49 7 1238 4590 7486 9 19 73 72 3 0 350932 84324 594 6 2 0 607 0 75 10 1392 5040 8431 10 20 70 53 4 0 347908 83620 682 4 2 0 660 0 53 8 1280 5072 8647 10 20 69 360 3 0 346000 82164 1336 3 4 0 1162 0 80 11 1415 8164 8891 26 27 47 19 4 0 346792 80344 606 1 2 0 559 0 14 10 1283 4140 8060 11 19 71 18 3 0 337024 79208 498 9 2 0 596 0 15 21 1598 4979 6531 11 20 70 12 4 0 343628 71740 818 4 2 0 546 0 63 13 1449 5099 6503 9 18 73 26 3 0 352104 65332 1322 5 3 0 930 0 66 14 1340 8979 7786 11 25 64 25 5 0 387428 46428 5383 4 2 0 4397 0 16 9 1876 27473 15659 29 71 0 40 6 0 404196 36828 5136 2 2 0 3927 0 46 12 2050 25730 15231 28 72 0 26 3 0 390944 67624 3763 3 3 0 3513 1365 53 12 2042 21203 15966 22 69 9 388 6 0 354580 82856 572 2 2 0 1325 0 28 6 1305 7314 10551 9 28 62 356 4 0 341624 86328 1247 2 1 0 1369 0 70 8 1225 4977 6641 20 20 60 12 3 0 338044 86392 384 8 3 0 384 0 1 10 1028 3300 6861 14 15 71 341 3 0 336016 85280 24 9 3 0 48 0 0 8 1010 2983 6120 4 14 83 18 3 0 327908 86340 660 5 3 0 696 0 11 9 1112 3515 6304 13 16 71 320 3 0 322972 85720 522 2 2 0 512 0 3 14 1083 3043 5146 14 12 74 8 3 0 323028 83276 255 3 2 0 211 0 12 11 1112 2712 4367 4 11 85 ----------------------------------------------------------------------------- (with 2_0_28) [gregames@daedalus gregames]$ top last pid: 43953; load averages: 0.48, 0.62, 0.81 up 4+21:09:23 11:27:42 258 processes: 1 running, 257 sleeping CPU states: 2.8% user, 0.0% nice, 5.1% system, 2.4% interrupt, 89.8% idle [gregames@daedalus gregames]$ vmstat -w 5 procs memory page disks faults cpu r b w avm fre flt re pi po fr sr da0 da1 in sy cs us sy id 8 3 0 347520 56464 324 2 1 0 342 65 0 0 693 2645 1499 4 6 90 63 3 0 341884 57496 356 6 0 0 403 0 4 8 754 2451 1372 8 5 87 19 3 0 341808 56672 386 0 1 0 353 0 7 10 886 2829 1273 8 7 86 35 3 0 351424 53896 219 3 2 0 122 0 0 14 1023 2481 1033 4 8 88 7 3 0 351504 52840 31 1 1 0 1 0 1 4 916 2326 1180 2 5 93 68 3 0 431392 39536 4274 1 2 0 3315 1363 61 5 1504 19223 4338 21 46 34 16 3 0 382092 62120 681 1 3 0 1806 0 10 4 1422 8630 2455 5 22 73 13 3 0 371612 66116 700 9 6 0 822 0 11 9 988 4534 1637 10 10 80 45 3 0 360188 67256 449 2 3 0 480 0 20 5 1024 4424 1347 6 9 85 11 3 0 357000 66488 1809 1 1 0 1664 0 7 8 864 18482 1276 4 17 79 16 3 0 356388 65428 301 1 1 0 276 0 9 8 1140 3559 1490 10 9 81 15 3 0 353324 62920 139 1 1 0 138 0 36 11 1069 3355 1545 4 6 89 9 4 0 351680 60328 665 3 1 0 550 0 25 14 1123 4700 1474 12 10 79 35 3 0 350108 59352 930 3 1 0 865 0 27 7 1047 7593 1224 10 10 81 65 3 0 355612 57188 907 1 1 0 762 0 52 6 1018 4586 1515 10 12 78 57 3 0 353360 56660 445 0 1 0 451 0 33 7 909 3744 1436 7 9 84 23 3 0 355708 53164 379 2 1 0 256 0 45 6 946 3333 1487 6 7 88 10 4 0 369888 51636 586 2 1 0 535 0 54 6 1066 4817 1174 7 11 82 30 3 0 369504 49556 258 4 1 0 212 0 20 8 920 3304 1189 5 7 88 -------- Original Message -------- Subject: upgrade to FreeBSD 4.5-PRERELEASE Date: Fri, 28 Dec 2001 14:49:56 -0800 (PST) From: Brian Behlendorf <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] I've upgraded icarus and daedalus to the most recent cut of the "stable" branch of FreeBSD, which is currently named "4.5-PRERELEASE" as the 4.5 release is imminent. There have been lots of improvements in performance and stability with this release, and it's good to keep current anyways. I've noticed, btw, an occurance of load spiking on daedalus in the last week - where the load jumps up to 30 or so for a few minutes then back down. I get a page whenever the 10-minute load average is above 8, and when I get that page I also get a quick "top" output, but by the time I get that notification there's no clear process causing that load. So I've been getting 10-20 pages per day on my phone due to the load, without a way to tell what's been causing it. The only thing I can think of that has changed significantly over the last week was a newer httpd being installed. Has anyone else seen this from recent httpd 2.0 releases? Brian