Chad, Something is causing the cfserver process to crash with a SIGSEGV (signal 11). Since session variables are kept in memory, they go away when the server crashes which is why your users see 'timeouts'.
In general, you need to isolate what page or operation is the root cause of the crash. Very common causes are 3rd party software (CFX's, database libraries, etc) or improper locking of session, application or server scope CFML variables. There is a very good KB article on the Macromedia support site, do a search for CFLOCK. My guess is you have a locking problem. Since cfserver is a multithreaded Linux program, core files are useless and there seems to be no way to get post-mortem stack traces of where the crash actually happens without patching the kernel. Jesse Noller ([EMAIL PROTECTED]) has more info about this. Cfserver establishes a signal handler for various crashing signals, the signal handler attempts to log a message in server.log and then performs an abort(), which is where the SIGABRT come from (see the abort(3) man page). The cfexec process is a watchdog and monitors the other processes for crashes and restarts them as needed. See executive.log for restart messages. With a little but of work, you should be able to track down the cause of your instability. Good luck! -- Tom Jordahl Macromedia server development -----Original Message----- From: chad [mailto:[EMAIL PROTECTED]] Sent: Sunday, February 03, 2002 5:52 PM To: CF-Linux Subject: cfserver getting signal 11 or signal 6 ? Hi All, We have been getting complaints that session variables are timing out on our sites running CF5.0 on Linux. We have found entries like this in the server.log file: "Fatal","8200","02/02/02","17:15:49",,"Caught a fatal signal (11) - Aborting" "Information","1024","02/02/02","17:15:50",,"The ColdFusion Application Server started." "Fatal","7175","02/02/02","17:51:58",,"Caught a fatal signal (11) - Aborting" "Information","1024","02/02/02","17:52:00",,"The ColdFusion Application Server started." Grabbing a core file and running gdb gave the following snippits: GNU gdb 5.0 (no debugging symbols found)... Core was generated by `/opt/coldfusion/bin/cfserver'. Program terminated with signal 6, Aborted. Reading symbols from /usr/lib/coldfusion/libxerces-c1_2.so...done. Loaded symbols for /usr/lib/coldfusion/libxerces-c1_2.so . (lots of so 's later) ... #0 0x409b856a in sigsuspend () from /lib/libc.so.6 (gdb) bt #0 0x409b856a in sigsuspend () from /lib/libc.so.6 #1 0x4093049d in __pthread_wait_for_restart_signal (self=0xbe1ffe40) at pthread.c:785 #2 0x4092d4ba in pthread_cond_wait (cond=0x940006c, mutex=0x9400054) at restart.h:26 #3 0x408eb77d in ThreadPoolConsume () from /usr/lib/coldfusion/libporting.so #4 0x408febd3 in BtThreadBase () from /usr/lib/coldfusion/libporting.so #5 0x4092e5d7 in pthread_start_thread (arg=0xbe1ffe40) at manager.c:241 (gdb) info f Stack level 0, frame at 0xbe1ffc50: eip = 0x409b856a in sigsuspend; saved eip 0x4093049d called by frame at 0xbe1ffce8 Arglist at 0xbe1ffc50, args: Locals at 0xbe1ffc50, Previous frame's sp is 0x0 Saved registers: ebx at 0xbe1ffc38, ebp at 0xbe1ffc50, esi at 0xbe1ffc3c, edi at 0xbe1ffc40, ei p at 0xbe1ffc54 (gdb) info args No symbol table info available. A bit of assembly shows that the EIP (0x409b856a) points to just after the 0xb3 syscall (sys_rt_sigsuspend). 0x409b8560 <sigsuspend+52>: push %ebx 0x409b8561 <sigsuspend+53>: mov %edi,%ebx 0x409b8563 <sigsuspend+55>: mov $0xb3,%eax 0x409b8568 <sigsuspend+60>: int $0x80 0x409b856a <sigsuspend+62>: pop %ebx So It looks like a thread suspended itself, recieved a SIGABRT (signal 6), was woken up, and proceeded to die. What I wonder about is the server.log that claims a signal 11 was recieved. I suspect that cfexec monitors cfserver and spawns a new cfserver when cfserver dies. I (fortunately :) ?) don't have experience debugging pthreaded code. Does anyone have any ideas? I don't know where the abort signal came from. The plan is to install a kernal patch/hook to log signals 6 and 11. Unfortunately it is difficult to localize what CF code could be causing this as we have literally thousands of .cfm files. Sorry about that. We did have troubles with the Merant MySQL driver and are now using a MyODBC-2.50.39 driver. Before I had a core file (and still thought is was a sig 11) I tried upping the stack limit from 8Megs to 32Megs and then it did go 72 minutes before restarting. I'm told that there doesn't seem to be the session timeout problems on Windows 2000 but that it does run slower. Thanks for any ideas at all! Chad < chad @ webcorelabs . com > Jr Sys Admin Webcore Labs Inc ______________________________________________________________________ This list and all House of Fusion resources hosted by CFHosting.com. The place for dependable ColdFusion Hosting. ------------------------------------------------------------------------------ Archives: http://www.mail-archive.com/cf-linux%40houseoffusion.com/ To Unsubscribe visit http://www.houseoffusion.com/index.cfm?sidebar=lists&body=lists/cf_linux or send a message to [EMAIL PROTECTED] with 'unsubscribe' in the body.
