>Number: 3546 >Category: os-solaris >Synopsis: httpd children become unkillable (NOT NFS problem) >Confidential: no >Severity: serious >Priority: medium >Responsible: apache >State: open >Class: sw-bug >Submitter-Id: apache >Arrival-Date: Thu Dec 17 04:30:00 PST 1998 >Last-Modified: >Originator: [EMAIL PROTECTED] >Organization: apache >Release: 1.3.3 >Environment: SunOS 5.6 Generic_105181-11 sun4d sparc SUNW,SPARCserver-1000 gcc 2.8.1 patches list: 105181-09 105393-07 105558-03 105623-01 105755-07 106017-01 106222-01 106388-01 106556-01 106828-01 105181-11 105395-03 105562-03 105633-05 105786-06 106040-03 106226-01 106429-01 106569-01 106863-01 105210-15 105401-16 105566-05 105651-08 105786-07 106040-10 106235-02 106435-01 106592-01 106882-01 105210-17 105403-02 105568-11 105654-03 105795-05 106123-02 106242-01 106439-02 106639-02 106929-01 105223-05 105405-02 105570-01 105665-03 105797-05 106125-05 106257-04 106448-01 106641-01 105284-16 105463-05 105572-07 105667-02 105800-03 106150-02 106271-04 106466-01 106651-01 105284-18 105472-06 105591-02 105669-04 105800-05 106172-04 106285-01 106471-01 106653-01 105356-07 105486-04 105600-06 105693-05 105836-03 106173-03 106292-02 106495-01 106655-01 105357-02 105490-05 105615-04 105703-07 105874-01 106183-04 106301-01 106497-01 106797-01 105375-09 105529-04 105621-08 105720-06 105924-03 106193-03 106303-01 106507-01 106808-01 105379-05 105529-05 105621-09 105755-06 105924-05 106219-02 106361-03 106522-01 106818-01 >Description: After a while, more and more children hung and do not respond to *any* signal. Neither gdb nor truss can load such processes. (for example, gdb doesn't proceed further than the message like : "Attaching to program `/usr/local/bin/httpd', process 7513"
Adding some additional logging to src/main/http_main.c revealed that they hung in one of 2 places: during csd = accept(sd, &sa_client, &clen) (in child_main()) or (less often) during setsockopt() in disable_nagle(). The compilation was standard (with --enable-module=auth_dbm) I tried also to define WORKAROUND_SOLARIS_BUG but it didn't help. The server doesn't mount anything via NFS. The server is quite busy, and there are 9 virtual hosts. KeepAlive On MaxKeepAliveRequests 30 KeepAliveTimeout 15 Apparently the problem appeared after upgrading to Solaris 2.6. >How-To-Repeat: It is not reproducible (doesn't depend on URL, client, or server load) but invariably starts to happen in 2-10 hours after restart. >Fix: >Audit-Trail: >Unformatted: [In order for any reply to be added to the PR database, ] [you need to include <[EMAIL PROTECTED]> in the Cc line ] [and leave the subject line UNCHANGED. This is not done] [automatically because of the potential for mail loops. ] [If you do not include this Cc, your reply may be ig- ] [nored unless you are responding to an explicit request ] [from a developer. ] [Reply only with text; DO NOT SEND ATTACHMENTS! ]
