Hi All,

We have been getting complaints that session variables are
timing out on our sites running CF5.0 on Linux.  We have found
entries like this in the server.log file:

"Fatal","8200","02/02/02","17:15:49",,"Caught a fatal signal (11) - Aborting"
"Information","1024","02/02/02","17:15:50",,"The ColdFusion Application Server 
started."
"Fatal","7175","02/02/02","17:51:58",,"Caught a fatal signal (11) - Aborting"
"Information","1024","02/02/02","17:52:00",,"The ColdFusion Application Server 
started."

Grabbing a core file and running gdb gave the following snippits:

GNU gdb 5.0
(no debugging symbols found)...
Core was generated by `/opt/coldfusion/bin/cfserver'.
Program terminated with signal 6, Aborted.
Reading symbols from /usr/lib/coldfusion/libxerces-c1_2.so...done.
Loaded symbols for /usr/lib/coldfusion/libxerces-c1_2.so

.. (lots of so 's later) ...

#0  0x409b856a in sigsuspend () from /lib/libc.so.6
(gdb) bt
#0  0x409b856a in sigsuspend () from /lib/libc.so.6
#1  0x4093049d in __pthread_wait_for_restart_signal (self=0xbe1ffe40)
    at pthread.c:785
#2  0x4092d4ba in pthread_cond_wait (cond=0x940006c, mutex=0x9400054)
    at restart.h:26
#3  0x408eb77d in ThreadPoolConsume () from
/usr/lib/coldfusion/libporting.so
#4  0x408febd3 in BtThreadBase () from /usr/lib/coldfusion/libporting.so
#5  0x4092e5d7 in pthread_start_thread (arg=0xbe1ffe40) at manager.c:241
(gdb) info f
Stack level 0, frame at 0xbe1ffc50:
 eip = 0x409b856a in sigsuspend; saved eip 0x4093049d
 called by frame at 0xbe1ffce8
 Arglist at 0xbe1ffc50, args:
 Locals at 0xbe1ffc50, Previous frame's sp is 0x0
 Saved registers:
  ebx at 0xbe1ffc38, ebp at 0xbe1ffc50, esi at 0xbe1ffc3c, edi at
0xbe1ffc40, ei
p at 0xbe1ffc54
(gdb) info args
No symbol table info available.


A bit of assembly shows that the EIP (0x409b856a)  points to
just after the 0xb3 syscall (sys_rt_sigsuspend).

0x409b8560 <sigsuspend+52>:     push   %ebx
0x409b8561 <sigsuspend+53>:     mov    %edi,%ebx
0x409b8563 <sigsuspend+55>:     mov    $0xb3,%eax
0x409b8568 <sigsuspend+60>:     int    $0x80
0x409b856a <sigsuspend+62>:     pop    %ebx


So It looks like a thread suspended itself, recieved a SIGABRT
(signal 6), was woken up, and proceeded to die.  What I wonder
about is the server.log that claims a signal 11 was recieved.  I
suspect that cfexec monitors cfserver and spawns a new
cfserver when cfserver dies.

I (fortunately :) ?) don't have experience debugging pthreaded code.
Does anyone have any ideas?

I don't know where the abort signal came from.  The plan is to
install a kernal patch/hook to log signals 6 and 11.

Unfortunately it is difficult to localize what CF code could be causing
this as we have literally thousands of .cfm files.  Sorry about that.

We did have troubles with the Merant MySQL driver and are now
using a MyODBC-2.50.39 driver.

Before I had a core file (and still thought is was a sig 11) I tried
upping the stack limit from 8Megs to 32Megs and then it did go
72 minutes before restarting.

I'm told that there doesn't seem to be the session timeout problems
on Windows 2000 but that it does run slower.

Thanks for any ideas at all!

Chad

< chad @ webcorelabs . com >
Jr Sys Admin
Webcore Labs Inc

______________________________________________________________________
Get the mailserver that powers this list at http://www.coolfusion.com
------------------------------------------------------------------------------
Archives: http://www.mail-archive.com/cf-linux%40houseoffusion.com/
To Unsubscribe visit 
http://www.houseoffusion.com/index.cfm?sidebar=lists&body=lists/cf_linux or send a 
message to [EMAIL PROTECTED] with 'unsubscribe' in the body.

Reply via email to