A reply from the same Hotspot engineer.
Paul
----
| Is there a reason why signal chaining is not enabled by default on
linux/solaris?
Just wanted to clarify - signal chaining is enabled by default for the JVM.
The way signal chaining works is cooperatively. i.e. when the JVM
starts, it "chains", i.e. saves
existing signal handlers for the signals it uses. Then when the JVM
gets a signal, it first processes
it itself and then passes it to the next "chained" signal handler.
With cooperative chaining, if the AMD or NVIDIA drivers also supported
(or it sounds like will support :-)
signal chaining - when the drivers register signal handlers, instead of
overwriting existing
signal handlers, they will also save them in a chain and pass on signals.
The libjsig.so which HAS to be set via a LD_PRELOAD interposes on the
system calls for
registering signal handlers. It has to be there before any of the
subsystems register signal handlers
and provides cooperative signal chaining for subsystems that do not
provide it themselves.
So, the best fix of course is the one you are pursuing, which is asking
Nvidia and AMD to also
perform cooperative signal chaining.
----
On 5/19/10 2:54 PM, Michael Bien wrote:
Hello Everyone,
(sorry for the delay, but something went wrong with my subscription, i
haven't noticed that I already got an answer)
comments inline...
More info from Hotspot engineers.
----
>Does webstart allow running your own native code in an applet?
yes
(Does
plugin while
>So I am guessing that they have java interfaces using the jvm/JIT
> - then gluegen -- how does gluegen work here? Is it precompiled
gluegen generates very thin JNI binding code, its precompiled ->
nothing at runtime.
but we have basically two modes: static linking against libOpenCL.so
or dynamic loading and invocation via function pointers which we
currently use for JOGL but not yet for JOCL (except for CL extensions).
or does it do a translation at run time?
no
> - which talks to OpenCL "C" binaries
> - there appear to be a set running on the "host" or main CPU,
> including interfacing to the underlying device drivers, such as
the amd and nvidia drivers mentioned
> - which then can also start OpenCL "C" binaries that run on
auxiliary processors like GPUs
yes thats basically how it works
>So to answer Michael's question from a VM perspective:
>It appears that the amd and nvidia native drivers that I would guess
they link to in their
>"host" code register for the system signals listed below, but don't
support signal chaining,
>i.e. they are overwriting the jvm's signal handlers.
>So - the technical solution for that, assuming we can't change the amd
and nvidia drivers,
>is to interpose our libjsig.so before their libraries are loaded. This
lets our vm chain
>their signal handlers, so that the VM only handles signals that apply
to the vm and then
>calls their signal handlers.
>I am guessing they can't link libjsig with their application or he
would have done so - but
>it is worth first asking why he can't.
Reading the libjsig doc i thought it would not work.
my understanding of libjsig:
it must be loaded before the most of the JVM starts, thats why there
are two options:
1.) LD_PRELOAD=path/to/libjsig
2.) link a custom native JVM launcher against libjsig
since webstart does not allow 2.) it won't help.
(please correct me if i am wrong here, otherwise the issue is already
solved :) )
>If it is the case that he can not, then he needs to setenv LD_PRELOAD
<libjvm.so-directory>/libjsig.so
>before starting up java.
>Is there a way to do that with WebStart? Is there a way to specify to
do that?
No - there is no ability to set any env variables before launching
java. If jnlp file itself is signed and trusted, you could set system
propertys before launching java, but not environmental variables.
exactly thats the issue... dead end regarding webstart.
In the meantime i talked to the Nvidia devs and they will try to
workaround this issue. Looking at the AMD drivers the situation is not
different but for some reasons it appears to run more stable (but this
is probably just luck). So we would have to talk to them too and maybe
other vendors.
Is there a reason why signal chaining is not enabled by default on
linux/solaris?
If there is a reason, could the webstart launcher enable it only for
webstart on those systems?
Third way to solve this issue is to introduce a
"-XX:enableSignalChaining" flag as mentioned before and allow it to be
passed via JNLP.
This might work since i heard webstart runs out of process anyway...
so there must be a launcher involved which interprets this flags
before JVM launch.
to quote my original mail:
It looks like a good self-defence mechanism for me :)
thanks for the support,
best regards,
Michael Bien
-----
Paul
>/ A partial answer: one of the Hotspot engineers says
/>/
/>/ "I think the short answer is that chaining requires LD_PRELOAD to
/>/ override the signal entry points. Otherwise we [Hotspot] wouldn't see
/>/ the calls that change the signal handlers. If the Java command itself
/>/ linked against jsig that would work too I think. I believe that's the
/>/ only way to solve the problem he is seeing in an automatic fashion.
/>/ Depending on how the driver library gets loaded they might be able to
/>/ build their own signal handler trampolines to work around it and
/>/ correct the signal handlers after it gets loaded."
/>/
/>/ Regards,
/>/
/>/ Paul
/>/
/>/ On 5/8/10 7:31 AM, Michael Bien wrote:
/>>/ Hello everyone,
/>>/
/>>/ i am one of the maintainers of JOGL and wrote JOCL
/>>/ (http://jogamp.org/) and we are currently facing some signal handling
/>>/ issues caused by the nvidia and amd drivers.
/>>/ (I got the hint to post to this list since there is no better alias
/>>/ for this kind of topics)
/>>/
/>>/ e.g. the nvidia OpenCL driver uses at least the following handlers:
/>>/ Warning: SIGSEGV handler expected:libjvm.so+0x5d8cf0
/>>/ found:libnvidia-compiler.so+0x1865e0
/>>/ Warning: SIGILL handler expected:libjvm.so+0x5d8cf0
/>>/ found:libnvidia-compiler.so+0x1865e0
/>>/ Warning: SIGFPE handler expected:libjvm.so+0x5d8cf0
/>>/ found:libnvidia-compiler.so+0x1865e0
/>>/ Warning: SIGBUS handler expected:libjvm.so+0x5d8cf0
/>>/ found:libnvidia-compiler.so+0x1865e0
/>>/ Warning: SIGXFSZ handler expected:libjvm.so+0x5d8cf0
/>>/ found:libnvidia-compiler.so+0x1865e0
/>>/ (-Xcheck:jni)
/>>/
/>>/ which basically makes the jvm unusable on Linux and leads to
/>>/ segmentation faults (in the driver, I suppose the driver catches jvm
/>>/ signals).
/>>/
/>>/ LD_PRELOAD
/>>/
(http://java.sun.com/javase/6/webnotes/trouble/TSG-VM/html/signals.html#gbzbl)
/>>/ works perfectly but it is not allowed for webstart + applets...
/>>/
/>>/ do you have any advice how we could workaround this issue? The
/>>/ perfect solution would be a "-XX:enableSignalChaining" flag which we
/>>/ could set via jnlp. Since the webstart JVM is out of process anyway
/>>/ (since u10 or so) this would probably work.
/>>/
/>>/ Why isn't signal chaining enabled by default on linux and solaris? It
/>>/ looks like a good self-defence mechanism for me :)
/>>/
/>>/ best regards,
/>>/ Michael Bien
/>>/
/>>/ ---
/>>/
/>>/ http://michael-bien.com
/-------------- next part --------------