Dear Ramesh,
Thanks for the tips. I did make some progress, but did not solve
the problem.
Abstract:
Sending *any* signal that is blocked by the connection threads
(the only ones that attach to the JVM), but that is handled by the main
thread, unblocks the JVM. Until is blocks again, obviously.
Story:
AOLServer does not use SIGUSR1 internally. They rely on Linux
kernel threads which (AFAIK) use SIGUSR1 themselves.
AOLServer does however use SIGUSR2 for an undocumented(!) purpose
which I have determined to be the re-initialization of their internal TCL
interpreter. A signal handler is installed, and the signal is masked for
all threads other than main (which does signal handling). Fortunately,
hacking out the code that used SIGUSR2 was trivial. Unfortunately it did
not do the trick. I am experiencing the same problem.
I am not questioning your knowledge on the SIGUSR2 issue, but note
that other JVMs do not seem to use it. For instance Sun's JDK 1.2.2 for
Linux does NOT use SIGUSR2. I have checked the source. Unfortunately their
implementation of native threads on Linux is broken in other ways (stack
corruption on pthread_join), so I cannot use it.
I did notice however that sending a SIGUSR2 to the unmodified
AOLServer has the same curative effect as SIGHUP. After I hacked out the
SIGUSR2 code from AOLServer, sending a SIGUSR2 no longer had any effect.
This tells me that SIGUSR2 *may* be part of the problem, but its
definitely not all of it.
This led me to generalize that sending any signal that is blocked
by the connection threads, but handled in the main thread, would unblock
the JVM. I tried with SIGINT and SIGTERM; just before the server goes
down, the request is nicely served. The only one that doesn't do the trick
is SIGPIPE. This is because the AOLServer does not install a handler for
that, it only blocks it.
I have tried IBM JDK 1.1.8 and the behavior is identical.
Question: where do I go from here? I cannot hack out the SIGHUP or
SIGINT code, it would make the AOLServer not usable as a daemon.
TIA,
Vasile
On Wed, 17 May 2000, Ramesh Thummala wrote:
> Vasile,
> I had some painful experiences with the Blackdown JDK 1.2.2-RC4 with respect to
>signals. It looks like Blackdown JVM is using SIGUSR1 and SIGUSR2 to suspend the JVM
>threads while garbage collector starts collecting garbase. During this process, rest
>of the JVM threads(exceopt garbage collector) will call a signal handler which will
>wait on a semaphore until the garbage collector finishes and notifies all of them.
>This is the reason why your JVM hangs at random places. When i was debugging this
>problem, my JVM used to hang in a assignment statement (of all things to hang on :-(
>).
> If you make sure, you do not mask your SIGUSR1 and SIGUSR2, you should be able
>to run fine. However, there is one caveat. Linux threads use SIGUSR1 and SIGUSR2 for
>their internal communication in some products. If your product also uses them, you
>should see for alternatives.
>
> Hope this helps,
> Ramesh Thummala
>
> >>> Vasile GABURICI <[EMAIL PROTECTED]> 05/17/00 03:55PM >>>
> Hello Blackdown developers,
>
> Abstract of this rather long message:
>
> What exactly is allowed and what is not with respect to signals in
> a multithreaded application that uses the Blackdown 1.2.2-RC4 JDK on
> Linux? I am experiencing random thread hangs inside the JVM. When that
> happens the entire JVM hangs, no other thread may attach to it, but the
> rest of the threads (that don't use the JVM) continue to run just fine.
> Why do I suspect that this is related to signals? Well, you'll have to
> read the long version below:
>
> I am developing a plug-in for AOLServer to run Tomcat in-process.
> For those of you who are not familiar with these two, AOLServer is a
> multithreaded, open source web server written in C and Tomcat is a servlet
> engine (written in Java).
>
> I am developing this on Linux 2.2.14 with glibc 2.1.3 and the
> Blackdown JDK 1.2.2-RC4. On Linux, the AOLServer use kernel threads, like
> the Blackdown JDK (with native threads). So far this is should be safe.
>
> The plug-in instantiates a JVM inside the AOLServer process.
> Threads that handle requests for servlets attach themselves to the JVM,
> and call a method in the servlet engine. The request is then processed on
> the same thread, but inside the JVM, which calls back the web server to
> write out the response. A request is handled this way on a single thread.
> ASCII drawing (WS = web server, SE = servlet engine):
>
> request --> [ C function in WS ] --> [ Java method in SE ] -->
> --> [ C callback in WS ] --> response
>
> All this works fine, except that in some instances the threads
> that process requests hang *at random points* inside the JVM. Usually a
> thread hangs when it calls a method (not JNI, a Java to Java call). The
> methods called are *not* synchronized. I know for a fact that it happens at
> truly random points. I have done extensive logging on the Java side using
> synchronous writes. There are no locks involved (synchronized blocks or
> methods). Also, the servlet engine (Tomcat) is just fine when it runs
> outside the web server.
>
> Other people that are working on a module for Apache 2.0a have
> encountered the same problem (but they work in a three letter company and
> won't ask here for help).
>
> Why do I ask about signals? Well, the really weird thing is that
> when I send a SIGINT to the server to shut it down, *all* threads that are
> blocked in the JVM suddenly awake and complete normally, sending every bit
> of the response to the client. AOLServer has a nice shutdown procedure
> that waits a while for threads to finish their work. It does however
> kill them if they don't finish in a given interval.
>
> Thus, I came to the conclusion that it must be related to signals.
> So, I started to dig this issue and I found a stern warning in the
> Blackdown FAQ:
>
> * Native code using JNI should NOT modify the signal processing state.
> The VM uses signals and any change to the signal handling may result
> in VM failures.
>
>
> Now, any decent web server (or daemon) has to intercept some
> signals. On the AOLServer side the source is nicely commented. The main
> thread does the following:
>
> /*
> * Block SIGHUP, SIGPIPE, SIGTERM, and SIGINT. This mask is
> * inherited by all subsequent threads so that only this
> * thread will catch the signals in the sigwait() loop below.
> * Unfortunately this makes it impossible to kill the
> * server with a signal other than SIGKILL until startup
> * is complete.
> */
>
> I hope that you guys that wrote the Linux specific JVM can tell me
> if this is the reason why threads hang. If so, what should I do?
>
> Less vague questions:
>
> 1) What *exactly* is permitted and what not with respect to
> signals, threads and masks? You cannot reasonably expect a web server not
> to deal with any signals...
>
> 2) What exactly does the -Xrs flag do? How does "reduce the use of
> OS signals"? Can this help given the list of signals AOLServer uses?
>
> 3) What if I pass the request to a different Java-only thread in
> the JVM (lets call this a proxy thread) and make the C-Java thread that
> handles the connection wait on the C side until the proxy thread calls back
> the C side with the results? This would require some work so, I'd like to
> hear your opinion on it. Could it make any difference?
>
> The AOLServer is open source, so I can hack it any way I like, but
> that's not true for the JVM. Clearly a web server that wouldn't handle
> any signals is unlikely to be popular, so I can't just hack them out. A
> finer solution is needed.
>
> I have also tried the IBM 1.1.8 JDK and found the same problem. So
> the issue might be inherited from Sun's signaling code. But you guys seem
> aware of the problem; you're the only ones that mention it in FAQ.
>
> Finally, if you think I am asking for too much, please note that I
> am making the plug-in available for free, and I don't get paid to write
> it, so I can't spend any money on Java consulting :-) You can get your
> own JVMs to hang on it at http://www.ss.pub.ro/~gaburici/nstomcat/
>
>
> TIA,
> Vasile
>
>
>
> ----------------------------------------------------------------------
> To UNSUBSCRIBE, email to [EMAIL PROTECTED]
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
>
>
> ----------------------------------------------------------------------
> To UNSUBSCRIBE, email to [EMAIL PROTECTED]
> with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]
>
>
----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]