Chad,

Something is causing the cfserver process to crash with a SIGSEGV (signal 11).  Since 
session variables are kept in memory, they go away when the server crashes which is 
why your users see 'timeouts'.

In general, you need to isolate what page or operation is the root cause of the crash. 
 Very common causes are 3rd party software (CFX's, database libraries, etc) or 
improper locking of session, application or server scope CFML variables.  There is a 
very good KB article on the Macromedia support site, do a search for CFLOCK.  My guess 
is you have a locking problem.

Since cfserver is a multithreaded Linux program, core files are useless and there 
seems to be no way to get post-mortem stack traces of where the crash actually happens 
without patching the kernel.  Jesse Noller ([EMAIL PROTECTED]) has more info 
about this.

Cfserver establishes a signal handler for various crashing signals, the signal handler 
attempts to log a message in server.log and then performs an abort(), which is where 
the SIGABRT come from (see the abort(3) man page).  The cfexec process is a watchdog 
and monitors the other processes for crashes and restarts them as needed.  See 
executive.log for restart messages.

With a little but of work, you should be able to track down the cause of your 
instability.  Good luck!

--
Tom Jordahl
Macromedia server development

-----Original Message-----
From: chad [mailto:[EMAIL PROTECTED]]
Sent: Sunday, February 03, 2002 5:52 PM
To: CF-Linux
Subject: cfserver getting signal 11 or signal 6 ?


Hi All,

We have been getting complaints that session variables are
timing out on our sites running CF5.0 on Linux.  We have found
entries like this in the server.log file:

"Fatal","8200","02/02/02","17:15:49",,"Caught a fatal signal (11) - Aborting"
"Information","1024","02/02/02","17:15:50",,"The ColdFusion Application Server 
started."
"Fatal","7175","02/02/02","17:51:58",,"Caught a fatal signal (11) - Aborting"
"Information","1024","02/02/02","17:52:00",,"The ColdFusion Application Server 
started."

Grabbing a core file and running gdb gave the following snippits:

GNU gdb 5.0
(no debugging symbols found)...
Core was generated by `/opt/coldfusion/bin/cfserver'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/coldfusion/libxerces-c1_2.so...done.
Loaded symbols for /usr/lib/coldfusion/libxerces-c1_2.so

. (lots of so 's later) ...

#0  0x409b856a in sigsuspend () from /lib/libc.so.6
(gdb) bt
#0  0x409b856a in sigsuspend () from /lib/libc.so.6
#1  0x4093049d in __pthread_wait_for_restart_signal (self=0xbe1ffe40)
    at pthread.c:785
#2  0x4092d4ba in pthread_cond_wait (cond=0x940006c, mutex=0x9400054)
    at restart.h:26
#3  0x408eb77d in ThreadPoolConsume () from
/usr/lib/coldfusion/libporting.so
#4  0x408febd3 in BtThreadBase () from /usr/lib/coldfusion/libporting.so
#5  0x4092e5d7 in pthread_start_thread (arg=0xbe1ffe40) at manager.c:241
(gdb) info f
Stack level 0, frame at 0xbe1ffc50:
 eip = 0x409b856a in sigsuspend; saved eip 0x4093049d
 called by frame at 0xbe1ffce8
 Arglist at 0xbe1ffc50, args:
 Locals at 0xbe1ffc50, Previous frame's sp is 0x0
 Saved registers:
  ebx at 0xbe1ffc38, ebp at 0xbe1ffc50, esi at 0xbe1ffc3c, edi at
0xbe1ffc40, ei
p at 0xbe1ffc54
(gdb) info args
No symbol table info available.


A bit of assembly shows that the EIP (0x409b856a)  points to
just after the 0xb3 syscall (sys_rt_sigsuspend).

0x409b8560 <sigsuspend+52>:     push   %ebx
0x409b8561 <sigsuspend+53>:     mov    %edi,%ebx
0x409b8563 <sigsuspend+55>:     mov    $0xb3,%eax
0x409b8568 <sigsuspend+60>:     int    $0x80
0x409b856a <sigsuspend+62>:     pop    %ebx


So It looks like a thread suspended itself, recieved a SIGABRT
(signal 6), was woken up, and proceeded to die.  What I wonder
about is the server.log that claims a signal 11 was recieved.  I
suspect that cfexec monitors cfserver and spawns a new
cfserver when cfserver dies.

I (fortunately :) ?) don't have experience debugging pthreaded code.
Does anyone have any ideas?

I don't know where the abort signal came from.  The plan is to
install a kernal patch/hook to log signals 6 and 11.

Unfortunately it is difficult to localize what CF code could be causing
this as we have literally thousands of .cfm files.  Sorry about that.

We did have troubles with the Merant MySQL driver and are now
using a MyODBC-2.50.39 driver.

Before I had a core file (and still thought is was a sig 11) I tried
upping the stack limit from 8Megs to 32Megs and then it did go
72 minutes before restarting.

I'm told that there doesn't seem to be the session timeout problems
on Windows 2000 but that it does run slower.

Thanks for any ideas at all!

Chad

< chad @ webcorelabs . com >
Jr Sys Admin
Webcore Labs Inc


______________________________________________________________________
This list and all House of Fusion resources hosted by CFHosting.com. The place for 
dependable ColdFusion Hosting.
------------------------------------------------------------------------------
Archives: http://www.mail-archive.com/cf-linux%40houseoffusion.com/
To Unsubscribe visit 
http://www.houseoffusion.com/index.cfm?sidebar=lists&body=lists/cf_linux or send a 
message to [EMAIL PROTECTED] with 'unsubscribe' in the body.

Reply via email to