Dear Juergen,
Thanks a lot for the 1st class analysis and solution for this
problem. You've made my day :-) I've implemented the fix you proposed, and
all is fine now with Blackdown JDK 1.2.2.
Apparently, I've been living on another planet lately, because
I've missed the introduction of real time signals in glibc 2.1 ;-( Too bad
that the only notice went into the NEWS file. No documentation so far...
When I was strace-ing the JDK, it seemed strange to me that I was getting
all the SIG_RTx stuff...
Unfortunately, my work is now completely broken with IBM JDK
1.1.8. Beats me why. IBM's baby requires glibc 2.1 as well, so I suppose
they must have used real time signals to do things. Threads hang when they
try to attach to the JVM. Because the sigwait()ing main thread is no
longer a Java thread, sending a signal to it does nothing. I'm not trying
to push my luck here and find from you what IBM did with their VM, but
perhaps you have clue.
I am not a big fan of JDK 1.1.x, but Tomcat is required to work
both on JDK 1.2 and 1.1 currently. However, My little hack however is not.
So, the only "downside" is that people will have to use Blackdown JDK :-)
> Then you block the main thread (which also is a Java thread) with
> sigwait(). At some point the GC will have to take place: The GC has
> to suspend all Java threads, this is implemented with a signal based
> suspend/resume scheme (based on a bug-fixed version of Dave Butenhof's
> example in his 'Programming with POSIX Threads' book). But the GC
> fails to suspend one thread, your main thread, which is blocked in
> sigwait()! And that's your deadlock.
> If you shut down the server by sending SIGTERM, the main thread gets
> out of the sigwait() and the GC finally can suspend it, do its work,
> and then resume all Java threads.
I'm not familiar with the book you mention, but I understand that
a real time signal is sent to suspend the thread. I didn't check the glibc
source, but I guess that sigwait() blocks real time signals as well.
> The native threads implementation uses SIGRTMIN + 3 and SIGRTMIN + 4
> and SIGPIPE.
When I was strace-ing my old broken code (that had the "advantage"
of catching the Java signals in its main thread), I saw stuff like this:
rt_sigaction(SIGRT_6, {0x40027570, ~[], SA_RESTART|SA_SIGINFO|0x4000000}
NULL, 8) = 0
rt_sigaction(SIGRT_7, {0x40027570, ~[], SA_SIGINFO|0x4000000}, NULL, 8)=0
when the JVM starts. Then came a lot of:
--- SIGRT_6 (Real-time signal 6) ---
rt_sigsuspend(~[RT_7] <unfinished ...>
--- SIGRT_7 (Real-time signal 7) ---
<... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system call)
rt_sigreturn(0xbfffdffc) = -1 EINTR (Interrupted system call)
rt_sigreturn(0xbfffe240) = -1 EINTR (Interrupted system call)
Unless I'm missing something this means SIGRTMIN+6 and SIGRTMIN+7.
BTW, do you know how to strace threads other than main on Linux?
This could be very usefull in situations like the ones I've encountered.
Again, thanks a lot for your invaluable help.
Cheers,
Vasile
----------------------------------------------------------------------
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]