>Number: 1107 >Category: general >Synopsis: Runaway httpd process under heavy load >Confidential: no >Severity: serious >Priority: medium >Responsible: apache (Apache HTTP Project) >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Tue Sep 9 11:30:02 1997 >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.2.4 >Environment: Linux www 2.0.29 #5 Sat Sep 6 12:27:17 CDT 1997 i586 (also on 2.0.30) gcc version 2.7.2.1 (also 2.7.2.2) >Description: Under moderate to heavy loads (200+ open servers), apache servers will periodically "Lock Up". I compiled with -g on and found that select seems to be dying on select under heavy loads (possibly a result of insufficient FD's?)
Killing the process always restores the machine to full operation. The problem in the code is a hard loop condition in http_main.c's child_main(), where if an error occurs resulting in a srv<=0, execution IMMEDIATELY loops back to get another select, with causes another error, and so on. >How-To-Repeat: Under heavy loads running Linux, the problem happens with enough frequency to be Real Damn Annoying(tm). Get a site doing 200+ simultaneous connections and theres a good chance it'll happen at some point. >Fix: In line 1783 (http_main/child_main), change the 'continue' to an 'exit'. If one SIGTERMs the runaway process, the undesirable behavior doesn't travel over to other children (not for a while, anyways). This is just a workaround. I think the problem lies within Linux itself >Audit-Trail: >Unformatted: