DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://issues.apache.org/bugzilla/show_bug.cgi?id=30627>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://issues.apache.org/bugzilla/show_bug.cgi?id=30627 possible bug in handling ALARM signals on Solaris 9 Summary: possible bug in handling ALARM signals on Solaris 9 Product: Apache httpd-1.3 Version: 1.3.31 Platform: Sun OS/Version: Solaris Status: NEW Severity: Major Priority: Other Component: core AssignedTo: [email protected] ReportedBy: [EMAIL PROTECTED] Our central campus web server has been having stability issues every since we've upgraded to Solaris 9. Initially, we have just copied the apache binaries from the previous installation, but we've recently rebuilt using apache 1.3.31 and see the same behavior. In general, we see two problems, which I think have the same cause: 1. Apache servers will occationally fail to acquire an fcntl accept lock, causing the server to exit. 2. Apache servers occationally segfault. We tried moving the location of the lockfile and the type of the lockfile without any luck. After trussing the apache servers during these two different problems, I noticed that in both cases, immediately before the segfault or EDEADLK, apache recieves an ALARM signal interupting an lwp_park system call. Normally the ALARMs just come in during read/writes from what I can see. Anyways, it seems that the ALARM is received in a thread other than lwp#1 which seems to handle the main loop. In the following trace, apache is clearing working in LWP#1, but after an ALARM signal is received inside lwp_park, control seems to go to a different thread, with unexpected results: /1: poll(0xFFBFF8B8, 1, 0) = 0 /1: write(7, " H T T P / 1 . 1 3 0 4".., 222) = 222 /1: door_info(4, 0xFFBFD5E0) = 0 /1: door_call(4, 0xFFBFD5C8) = 0 /1: time() = 1092327277 /1: write(6, " u b - c o u n s e l i n".., 207) = 207 /1: times(0x7EAC09CC) = 14875555 /1: llseek(8, 0, SEEK_CUR) = 0 /1: close(8) = 0 /1: sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70) = 0 /1: read(7, 0x004E1CF0, 4096) (sleeping...) /203: lwp_park(0x7F71FC98, 0) Err#62 ETIME /203: lwp_park(0x7F71FC98, 0) (sleeping...) /203: Received signal #14, SIGALRM, in lwp_park() [caught] /203: lwp_park(0x7F71FC98, 0) Err#4 EINTR /203: sigprocmask(SIG_SETMASK, 0x7F71F7DC, 0x00000000) = 0 /1: read(7, 0x004E1CF0, 4096) Err#9 EBADF /203: close(7) = 0 /203: getcontext(0x7F71F538) /203: sigprocmask(SIG_SETMASK, 0x7F83A074, 0x7F71F300) = 0 /203: lwp_unpark(203, 1) = 0 /203: setcontext(0x7F71F310) /1: time() = 1092327294 /1: close(-1) Err#9 EBADF /1: sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70) = 0 /203: sigaction(SIGALRM, 0xFFBFF950, 0xFFBFFA70) = 0 /203: sigaction(SIGUSR1, 0xFFBFF950, 0xFFBFFA70) = 0 /203: fcntl(21, F_SETLKW, 0x004B444C) Err#45 EDEADLK /203: time() = 1092327294 /203: write(15, " [ T h u A u g 1 2 ".., 229) = 229 /203: sigaction(SIGHUP, 0xFFBFF890, 0xFFBFF9B0) = 0 /203: sigaction(SIGUSR1, 0xFFBFF890, 0xFFBFF9B0) = 0 /203: lwp_mutex_lock(0x7F838A00) = 0 /203: write(1, " L a u n c h i n g . . .".., 48) = 48 /203: _exit(15) here is another trace of the deadlock where I just watched open/close/fcntl: /1: close(8) = 0 /757: Received signal #14, SIGALRM, in lwp_park() [caught] /757: close(7) = 0 /1: fcntl(21, F_SETLKW, 0x004B4428) = 0 /1: fcntl(7, F_SETFD, 0x00000001) = 0 /1: fcntl(7, F_GETFL, 0x00000000) = 130 /1: fcntl(7, F_SETFL, 0x00000002) = 0 /1: open("/info/www/.htaccess", O_RDONLY) Err#2 ENOENT ...stuff deleted... /1: close(40) = 0 /1: close(8) = 0 /1: close(7) = 0 /1: fcntl(21, F_SETLKW, 0x004B444C) Err#45 EDEADLK and here is a trace of the same sort of signal handling, resulting in a segfault: (this one is very odd in that it seems two threads are trying to execute the same code concurrently) /1: close(8) = 0 /1: close(58) = 0 /1: close(56) = 0 /1: close(45) = 0 /194: Received signal #14, SIGALRM, in lwp_park() [caught] /1: close(7) Err#9 EBADF /194: close(7) = 0 /1: fcntl(21, F_SETLKW, 0x004B444C) (sleeping...) /194: fcntl(21, F_SETLKW, 0x004B444C) (sleeping...) /194: fcntl(21, F_SETLKW, 0x004B444C) = 0 /1: fcntl(21, F_SETLKW, 0x004B444C) = 0 /1: fcntl(21, F_SETLKW, 0x004B4428) = 0 /1: fcntl(7, F_SETFD, 0x00000001) = 0 /1: fcntl(7, F_GETFL, 0x00000000) = 130 /194: Incurred fault #6, FLTBOUNDS %pc = 0x7F952540 /194: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 /194: Received signal #11, SIGSEGV [caught] /194: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 /1: fcntl(7, F_SETFL, 0x00000002) = 0 /1: Received signal #11, SIGSEGV [default] /1: siginfo: SIGSEGV pid=16796 uid=60001 /194: Incurred fault #6, FLTBOUNDS %pc = 0x7F952540 /194: siginfo: SIGSEGV SEGV_MAPERR addr=0x00000000 We're running apache 1.3.31 with a bunch of modules. Since this appears to be a race condition that we only see under load and we can't disable modules on our production server, I haven't tested disabling individual modules. mod_auth_dce uses threads, but we haven't seen this problem before when using mod_auth_dce. php4, fastcgi and apache-ssl are also being used. Anyways, the reason I feel this is related to ALARM handling on solaris 9 is because of a note in the solaris 9 developer docs: http://docs.sun.com/db/doc/806-6867/6jfpgdcnt?q=alarm&a=view >Effective with the Solaris 9 Operating Environment, calls to alarm() or to >setitimer(ITIMER_REAL) will cause the resulting SIGALRM signal to be sent to >the process. some info on our server: > /usr/local/apache/httpsd -V Server version: Apache/1.3.31 Ben-SSL/1.55 (Unix) Server built: Aug 4 2004 10:28:40 Server's Module Magic Number: 19990320:16 Server compiled with.... -D HAVE_MMAP -D USE_MMAP_SCOREBOARD -D USE_MMAP_FILES -D NO_WRITEV -D HAVE_FCNTL_SERIALIZED_ACCEPT -D HAVE_SYSVSEM_SERIALIZED_ACCEPT -D HAVE_PTHREAD_SERIALIZED_ACCEPT -D DYNAMIC_MODULE_LIMIT=64 -D HARD_SERVER_LIMIT=1024 -D HTTPD_ROOT="/usr/local/apache" -D SUEXEC_BIN="/usr/local/apache/bin/suexec" -D DEFAULT_PIDLOG="logs/httpd.pid" -D DEFAULT_SCOREBOARD="logs/apache_runtime_status" -D DEFAULT_LOCKFILE="logs/accept.lock" -D DEFAULT_ERRORLOG="logs/error_log" -D TYPES_CONFIG_FILE="conf/mime.types" -D SERVER_CONFIG_FILE="conf/httpd.conf" -D ACCESS_CONFIG_FILE="conf/access.conf" -D RESOURCE_CONFIG_FILE="conf/srm.conf" > /usr/local/apache/httpsd -l Compiled-in modules: http_core.c mod_php4.c mod_env.c mod_log_config.c mod_mime_magic.c mod_mime.c mod_negotiation.c mod_status.c mod_info.c mod_include.c mod_autoindex.c mod_dir.c mod_cgi.c mod_fastcgi.c mod_asis.c mod_imap.c mod_actions.c mod_speling.c mod_userdir.c mod_alias.c mod_rewrite.c mod_access.c mod_auth_dce.c mod_auth.c mod_expires.c mod_headers.c mod_setenvif.c apache_ssl.c suexec: disabled; invalid wrapper /usr/local/apache/bin/suexec > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
