>Number: 1950 >Category: os-linux >Synopsis: All child processes die. Parent remains and no longer responds >to queries >Confidential: no >Severity: serious >Priority: medium >Responsible: apache >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Sun Mar 15 04:10:00 PST 1998 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.2.4 & 1.2.5 >Environment: Linux RedHat 5.0, with latest libc & ld.so patches glibc-2.0.6-9, ld.so-1.9.5-5, libc-5.3.12-25 Kernel 2.0.33 & 2.0.32 on HP SMP 2xPPro i686 512Mb Ram >Description: All child processes die for no apparent reason. This problem started for me when I upgraded to RedHat 5.0 from 4.2. Prior to this, the system was working fine. The problem happens very often on my site - sometimes as frequently as once every 2 hours. It occurs with both Apache 1.2.4 & 1.2.5. Also with Linux kernel 2.0.32 & 2.0.33.
When the children all die, I get something like this in the error_log: error_log: [Sun Mar 15 00:56:02 1998] access to /home/live/html/cdr/pub/cdd/cddpub.htm fail ed for gaitana.interred.net.co, reason: File does not exist [Sun Mar 15 00:56:29 1998] access to /home/live/html/architext/AT-aimquery.html failed for proxy.arcos.org, reason: File does not exist [Sun Mar 15 00:58:24 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 00:59:04 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 00:59:35 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 00:59:35 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer [Sun Mar 15 01:01:52 1998] accept: (client socket): Connection reset by peer I also have slightly hacked version of log_server_status running every minute. It reports the following for the minutes leading up to the above event: 004300:209:6:2549:1.59382 004401:211:7:2609:1.5662 004500:216:8:2660:1.54391 004600:217:8:2739:1.52671 004700:221:8:2833:1.53781 004800:227:9:2903:1.59585 004900:237:7:2998:1.67424 005000:236:9:3040:1.66213 005101:242:8:3101:1.63953 005201:247:3:3133:1.61418 005300:248:2:3171:1.61129 005400:245:5:3212:1.58615 005500:250:0:3292:1.56556 005601:247:3:3363:1.56482 005701:250:0:3414:1.55866 005759:250:0:3457:1.53429 005901:250:0:3483:1.50583 >From that point on, it can't speak to the server anymore. >How-To-Repeat: This bug seems to manifest itself on a random basis. I can't directly repeat the problem. However, I suspect that it is related to the amount of usage that the server is subjected to. There are 3 other httpd groups running on the same system bound to other IP alias addresses. All are compiled exactly the same way. These never exhibit this nor any other problem. Only our main server (http://www.who.ch) crashs. But it is typically subjected to 2-8 queries per second. >Fix: As mentioned earlier, the problem started for me when I upgraded from RedHat 4.2 to 5.0. Since I am running the same kernel (2.0.33)as before, even the same binary, I don't suspect the problem is there. However, the libc libraries are drastically different in 5.0. I think the problem might be there. I have surgically examined my system for other cron jobs et al. that might interfere with the httpd and come up blank. My latest attempt at fixing this has been to staticly compile and link the server on a RedHat 4.2 machine. The binary is huge (500K) but I don't really care because I have a lot of RAM. This has now been running for about an hour with out any problems on the RedHat 5.0 machine. Time will tell.... >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, ] [you need to include <[EMAIL PROTECTED]> in the Cc line ] [and leave the subject line UNCHANGED. This is not done] [automatically because of the potential for mail loops. ]
