We've upgraded from AOLserver v3.4 to v4.01 and have been having problems with the
server hanging intermittently without leaving any clues in the error log. The
AOLservers are running under SuSE 8.1 with Linux 2.4.19-64GB-SMP kernel.
Below are some 'ps' outputs of the AOLserver process in its failed state that may be
interesting:
% ps -ef | grep 17072
UID PID PPID C STIME TTY TIME CMD
nsadmin 17072 900 0 11:54 ? 00:00:02 /apps/aolserver-4.01/bin/nsd -i -t
/web/aol-configs/web7-demo-8607.tcl -u nsuser -g nsgroup
nsadmin 17098 17072 0 11:54 ? 00:00:00 [nsd <defunct>]
% ps -ely | grep 17072
S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD
S 110 17072 900 0 78 0 20852 7804 rt_sig ? 00:00:02 nsd
Z 110 17098 17072 0 75 0 0 0 do_exi ? 00:00:00 nsd <defunct>
'pstree' output:
|-supervise,900) web7-demo-8607
| `-nsd,17072) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
| `-(nsd,17098)
Typically, the 'pstree' output looks something like this (when the server is running):
% pstree -ap 900
supervise,900) web7-demo-8607
`-nsd,15008) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
`-nsd,15010) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
|-nsd,15011) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
|-nsd,15012) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
|-nsd,15022) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
|-nsd,15246) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
`-nsd,16138) -i -t /web/aol-configs/web7-demo-8607.tcl -u nsuser
So, it looks to me like the subprocess or thread is trying to exit or has exited, but
the parent is hung up while waiting for the child.
Here's the view seen via the debugger:
# gdb
GNU gdb 5.2.1
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i586-suse-linux".
(gdb) attach 17072
Attaching to process 17072
Reading symbols from /nfs/nfs1/apps/aolserver-4.01/bin/nsd...done.
Reading symbols from /apps/aolserver-4.01/lib/libnsd.so...done.
Loaded symbols for /apps/aolserver-4.01/lib/libnsd.so
Reading symbols from /apps/aolserver-4.01/lib/libnsthread.so...done.
Loaded symbols for /apps/aolserver-4.01/lib/libnsthread.so
Reading symbols from /apps/aolserver-4.01/lib/libtcl8.4.so...done.
Loaded symbols for /apps/aolserver-4.01/lib/libtcl8.4.so
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/libpthread.so.0...done.
[New Thread 1024 (LWP 17072)]
[New Thread 2049 (LWP 17098)]
Error while reading shared library symbols:
Can't attach LWP 17098: Operation not permitted
Reading symbols from /lib/libm.so.6...done.
Loaded symbols for /lib/libm.so.6
Reading symbols from /lib/libc.so.6...done.
Loaded symbols for /lib/libc.so.6
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /apps/aolserver-4.01/bin/nsperm.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/nsperm.so
Reading symbols from /apps/aolserver-4.01/bin/nssock.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/nssock.so
Reading symbols from /apps/aolserver-4.01/bin/nslog.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/nslog.so
Reading symbols from /apps/aolserver-4.01/bin/dqd_utils8.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/dqd_utils8.so
Reading symbols from /apps/aolserver-4.01/bin/ns_blowfish.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/ns_blowfish.so
Reading symbols from /apps/aolserver-4.01/bin/base64.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/base64.so
Reading symbols from /apps/aolserver-4.01/bin/qmail-trigger-pull.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/qmail-trigger-pull.so
Reading symbols from /apps/aolserver-4.01/bin/aolserver-pfpro.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/aolserver-pfpro.so
Reading symbols from /apps/lib/libpfpro.so...done.
Loaded symbols for /apps/lib/libpfpro.so
Reading symbols from /apps/aolserver-4.01/bin/nsdb.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/nsdb.so
Reading symbols from /apps/aolserver-4.01/lib/libnsdb.so...done.
Loaded symbols for /apps/aolserver-4.01/lib/libnsdb.so
Reading symbols from /apps/aolserver-4.01/bin/ora8.so...done.
Loaded symbols for /apps/aolserver-4.01/bin/ora8.so
Reading symbols from /opt/app/oracle/product/8.1.7/lib/libclntsh.so.8.0...done.
Loaded symbols for /opt/app/oracle/product/8.1.7/lib/libclntsh.so.8.0
Reading symbols from /opt/app/oracle/product/8.1.7/lib/libwtc8.so...done.
Loaded symbols for /opt/app/oracle/product/8.1.7/lib/libwtc8.so
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
0x4016dea9 in sigsuspend () from /lib/libc.so.6
(gdb)
(gdb) where
#0 0x4016dea9 in sigsuspend () from /lib/libc.so.6
#1 0x40115506 in sigwait () from /lib/libpthread.so.0
#2 0x400640f0 in ns_sigwait () from /apps/aolserver-4.01/lib/libnsthread.so
#3 0x4004e108 in NsHandleSignals () from /apps/aolserver-4.01/lib/libnsd.so
#4 0x40035d9c in Ns_Main () from /apps/aolserver-4.01/lib/libnsd.so
#5 0x080485c7 in main ()
#6 0x4015c4a2 in __libc_start_main () from /lib/libc.so.6
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Can't detach LWP 17099: No such process
The AOLserver runs find under Solaris 9.
Has anyone seen this before or have any suggestions for further diagnosing the failure?
- Fen Tamanaha
--
AOLserver - http://www.aolserver.com/
To Remove yourself from this list, simply send an email to <[EMAIL PROTECTED]> with the
body of "SIGNOFF AOLSERVER" in the email message. You can leave the Subject: field of
your email blank.