Hi Vasile,

    Vasile> The AOLServer code for signal handling is very simple. The
    Vasile> functions NsBlockSignals and NsHandleSignals are included
    Vasile> bellow. These functions contain all the signal handling
    Vasile> that is relevant to this issue.  AOLServer does more funny
    Vasile> things when it exec()s an external program, but I am not
    Vasile> exec()ing anythig, so the code below is all that applies.

    Vasile> The main() function first calls NsBlockSignals, then
    Vasile> starts a thread that will receive and dispatch connections
    Vasile> by launching new connection threads. After that, main()
    Vasile> calls NsHandleSignals where it stays until the server is
    Vasile> shut down. Therefore, all threads that will call the JVM
    Vasile> will have the SIGHUP, SIGPIPE, SIGTERM, SIGINT and SIGUSR2
    Vasile> signals masked.

    Vasile> One cannot expect a web server not to handle signals like
    Vasile> SIGHUP or TERM, so I don't see why this is "bad, bad, bad"
    Vasile> in principle. The fact that the JVM expects to run alone
    Vasile> and handle all signals itself is IMHO "bad, ugly, nasty"
    Vasile> :-) 

we don't say "don't use signals", we say "don't modify the signal
processing state".

    Vasile> I think that the AOLServer does the right thing in
    Vasile> protecting threads other than main from signals...

    Vasile> I am starting the JVM from main() between the calls to
    Vasile> NsBlockSignals and NsHandleSignals. That's because all
    Vasile> modules get initialized there. So, it should see that some
    Vasile> signals are blocked. But I guess that it doesn't check the
    Vasile> mask...

    Vasile> Any hints from here on?

Christopher is right, sigwait(3) is the problem:

You create a JVM in your main thread, this automatically makes the
thread a valid Java thread.  Creating the JVM also starts some other
threads, most notably a GC thread.  I assume you also create at least
one additional Java thread from the created JVM.

Then you block the main thread (which also is a Java thread) with
sigwait().  At some point the GC will have to take place: The GC has
to suspend all Java threads, this is implemented with a signal based
suspend/resume scheme (based on a bug-fixed version of Dave Butenhof's
example in his 'Programming with POSIX Threads' book). But the GC
fails to suspend one thread, your main thread, which is blocked in
sigwait()!  And that's your deadlock.
If you shut down the server by sending SIGTERM, the main thread gets
out of the sigwait() and the GC finally can suspend it, do its work,
and then resume all Java threads. 

The solution is not to call sigwait() in a Java thread.  In your case:
Create the JVM in a separate thread and not in the primordial thread.


        Juergen

-- 
Juergen Kreileder, Blackdown Java-Linux Team
http://www.blackdown.org/java-linux.html
JVM'01: http://www.usenix.org/events/jvm01/


----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to