I've got a jserv compiled into apache and I'm using it to run a servlet
that does an RMI call for ever doGet that it receives.  While doing a stress
test with the new native thread support, it looks like I've come across a bug
related to memory management or thread creation.

To do the stress test, I had two computers repeatedly doing a request
on the servlet.  Back to back, non stop.  (It was a pretty unrealistic
load for what the web server would do, but I'm also showing that apache
on Linux can stand up to this abuse.)

After doing 1000 hits in about 69 seconds, the vm would seg fault.
The first time I did a trial I had a System.gc() at the bottom of the
doGet to force a clean up and I got this as the stack trace:

SIGSEGV   11*  segmentation violation
        stackbase=BF1FFE7C, stackpointer=BF1FFA10

Full thread dump:
    "Thread-1002" (TID:0x40bd3d98, sys_thread_t:0x8183758, state:R, thread_t: 
t@1040375, sp:0x0 threadID:0x0, stack_base:0x0, stack_size:0x0) prio=5
    "Thread-1001" (TID:0x40bd40d0, sys_thread_t:0x8109698, state:R, thread_t: 
t@1039350, sp:0x40fff334 threadID:0x1d1d, stack_base:0x40fffe7c, stack_size:0x200000) 
prio=10
        java.lang.System.gc(System.java)
        EngineSnoopServlet.doGet(EngineSnoopServlet.java:125)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:486)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:575)
        org.apache.jserv.JServConnection.run(JServConnection.java:349)
    "ConnectionExpiration-10.0.4.50:18287" (TID:0x40bd2a90, sys_thread_t:0x81110d0, 
state:CW, thread_t: t@18450, sp:0xbd7ff830 threadID:0x191a, stack_base:0xbd7ffe7c, 
stack_size:0x200000) prio=10

This lead me to beleive the bug was in memory management since it appeared
to be happening in the call for garbage collection.  To test my theory,
I did a trial again, and again it failed right after the 1000 hit with
a this stack trace that looks very very similar to the first:

SIGSEGV   11*  segmentation violation
        stackbase=BF1FFE7C, stackpointer=BF1FFA10

Full thread dump:
    "Thread-1005" (TID:0x40bd3b78, sys_thread_t:0x812fa50, state:R, thread_t: 
t@1040375, sp:0x0 threadID:0x0, stack_base:0x0, stack_size:0x0) prio=5
    "Thread-1004" (TID:0x40bd3b70, sys_thread_t:0x80c77d8, state:R, thread_t: 
t@1039350, sp:0x40fff334 threadID:0x232c, stack_base:0x40fffe7c, stack_size:0x200000) 
prio=10
        java.lang.System.gc(System.java)
        EngineSnoopServlet.doGet(EngineSnoopServlet.java:125)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:486)
        javax.servlet.http.HttpServlet.service(HttpServlet.java:575)
        org.apache.jserv.JServConnection.run(JServConnection.java:349)
    "ConnectionExpiration-10.0.4.50:18287" (TID:0x40bd3638, sys_thread_t:0x81cfe98, 
state:CW, thread_t: t@14350, sp:0xbdfff830 threadID:0x1f23, stack_base:0xbdfffe7c, 
stack_size:0x200000) prio=10

To see if something different would happen if I didn't call System.gc() I removed that 
from my code.  Again, after 1000 hits, the VM seg faulted, only this
time I got a different stack trace:

Full thread dump:
    "Thread-1005" (TID:0x40c253d0, sys_thread_t:0x812f510, state:R, thread_t: 
t@1040375, sp:0x0 threadID:0x0, stack_base:0x0, stack_size:0x0) prio=5
    "ConnectionExpiration-10.0.4.50:18287" (TID:0x40bd3630, sys_thread_t:0x81d0018, 
state:CW, thread_t: t@14350, sp:0xbdfff830 threadID:0x2629, stack_base:0xbdfffe7c, 
stack_size:0x200000) prio=10
        sun.rmi.transport.tcp.TCPChannel$Reaper.run(TCPChannel.java:458)
        java.lang.Thread.run(Thread.java)
    "ConnectionExpiration-10.0.4.50:18010" (TID:0x40bd2528, sys_thread_t:0x81a12e8, 
state:CW, thread_t: t@13325, sp:0xbe1ff830 threadID:0x2627, stack_base:0xbe1ffe7c, 
stack_size:0x200000) prio=10
        sun.rmi.transport.tcp.TCPChannel$Reaper.run(TCPChannel.java:458)
        java.lang.Thread.run(Thread.java)
    "Cleaner" (TID:0x40bd1220, sys_thread_t:0x81b9000, state:CW, thread_t: t@12300, 
sp:0xbe3ff820 threadID:0x2625, stack_base:0xbe3ffe7c, stack_size:0x200000) prio=10
        sun.rmi.transport.DGCClient.run(DGCClient.java:621)
        java.lang.Thread.run(Thread.java)
    "LeaseRenewer" (TID:0x40bd10d8, sys_thread_t:0x81b8ee0, state:CW, thread_t: 
t@11275, sp:0xbe5ff820 threadID:0x2624, stack_base:0xbe5ffe7c, stack_size:0x200000) 
prio=10
        sun.rmi.transport.DGCClient.doRenewal(DGCClient.java:677)
        sun.rmi.transport.DGCClient$LeaseRenewer.run(DGCClient.java:936)
        java.lang.Thread.run(Thread.java)
    "Reaper" (TID:0x40bd0fe0, sys_thread_t:0x81b7198, state:CW, thread_t: t@10250, 
sp:0xbe7ff830 threadID:0x2623, stack_base:0xbe7ffe7c, stack_size:0x200000) prio=10
        sun.rmi.transport.Reaper.run(ObjectTable.java:199)
        java.lang.Thread.run(Thread.java)
    "ConnectionExpiration-10.0.4.50:1099" (TID:0x40bd0d18, sys_thread_t:0x81a2ba0, 
state:CW, thread_t: t@9225, sp:0xbe9ff830 threadID:0x2622, stack_base:0xbe9ffe7c, 
stack_size:0x200000) prio=10
        sun.rmi.transport.tcp.TCPChannel$Reaper.run(TCPChannel.java:458)
        java.lang.Thread.run(Thread.java)
    "Thread-2" (TID:0x40bcc610, sys_thread_t:0x81113d0, state:R, thread_t: t@5125, 
sp:0xbf1ffa6c threadID:0x2613, stack_base:0xbf1ffe7c, stack_size:0x200000) prio=5 
*current thread*

It appears that something terrible is being hit right after the 1000th
call (and maybe right around the 1000th thread creation?)

Can anyone recreate this behavior?

Keith, who's submiting this to jitterbug as well as to the list

-- 
  Keith T. Garner                                       [EMAIL PROTECTED]
  STR Consultant           http://www.str.com/             [EMAIL PROTECTED]
          "And to all the fly Linux coders out there workin on the
                   Deuce-Deuce, Peace...." -- Jason Beatty

Reply via email to