Hi Thomas, Archie,
Generally, I would agree that a bit more cleanup/resetting of the signal
state makes sense.
There is general unease about changing this kind of global state that
has been stable for a long time.
An application that expects to change operating system state before
calling exec and expecting it to be retained/passed to the child is
expecting too much.
A change here should also be reviewed by Hotspot, signals are a
sensitive area for them too.
Regards, Roger
On 4/29/25 10:55 AM, Thomas Stüfe wrote:
Hi Archie!
On Tue, Apr 29, 2025 at 4:17 PM Archie Cobbs <archie.co...@gmail.com>
wrote:
This reminds me of a bug I saw once decades ago: the JVM was not
setting the close-on-exec flag on new file descriptors, so child
processes were receiving copies of "leaked" file descriptors and
keeping them open indefinitely, causing the system to eventually
running out of file descriptors and crash.
I remember. We fixed that in jspawnhelper.
The moral of that story was that control of the "process
environment" of a new process (i.e., which file descriptors are
open, signal handlers installed, etc.) created via the usual
fork/exec sequence clearly belongs to the bit of code that is
spawning that process for whatever reason: any inheritable part of
the process environment that the JVM might mess around with must
be reset before spawning child processes.
Agreed.
So if some native code wants to configure SIGPIPE for SIG_IGN and
then fork/exec a child process that inherits that setting, then fine.
But if the JVM itself is doing its fork/exec for some unrelated
purpose (e.g., Runtime.exec()), it has the right and
responsibility to clean up the environment that the child process
is going to inherit to protect it from any changes due to the JVM
process, including native code.
So it seems like there are two possible bugs here:
1. Some JNI code links in a library that changes the (global)
setting for SIGPIPE. Is that allowed by the JNI specification?
Does the specification even say? It seems like this is not really
kosher, but anyway it happens to work by coincidence because
SIG_IGN and javaSignalHandler do the same thing (i.e., discard the
signal).
To my knowledge the JNI spec says nothing about signals.
2. The JVM is not completely scrubbing the child's process
environment when it spawns a new process as it should: All signal
handlers should be reset to their default values (unless there is
some JVM-specific reason to set them differently).
It seems like #2 is a valid bug or at least a valid improvement,
and whether #1 is a bug depends on what you believe JNI libraries
are officially allowed to do.
I agree. Note that we also have a slight inconsistency depending on
the order of third-party signal handler installation:
1) In a scenario like the one I described we start java, it installs
the SIGPIPE handler, which then gets preplaced by the third-party lib
with SIG_IGN, resulting in child processes running with SIG_IGN for
SIGPIPE.
2) If, however,r the libjvm.so gets embedded into a custom launcher,
which already had set SIG_IGN for SIGPIPE, the JVM will replace that
handler with its own, resulting in child processes running with SIG_DFL.
This seems a bit arbitrary to me and seems to support the view that
this behavior is not intended.
-Archie
On Tue, Apr 29, 2025 at 4:05 AM Thomas Stüfe
<thomas.stu...@gmail.com> wrote:
Hi,
I would like to gauge opinions on whether the following
scenario is a bug to fix or whether to accept it as standard
behavior.
---
A customer has the following problem:
- The JVM invokes a third-party JNI library that sets the
signal disposition of SIGPIPE to SIG_IGN (Boo! in this case,
it is the FIPS nspr library, which does this unconditionally.)
- The JVM then spawns child processes
- All child processes now ignore SIGPIPE, which leads to
failures in automation scripts
I was surprised. I always assumed the signal disposition of
all signals would be reset to SIG_DFL when exec'ing. However,
seems I was wrong when it came to SIG_IGN. Posix documents for
execve() states [1]:
1) " Signals set to the default action (SIG_DFL) in the
calling process image shall be set to the default action in
the new process image."
and
2) "Except for SIGCHLD, signals set to be ignored (SIG_IGN) by
the calling process image shall be set to be ignored by the
new process image."
and
3) "Signals set to be caught by the calling process image
shall be set to the default action in the new process image
(see /<signal.h>/
<https://urldefense.com/v3/__https://pubs.opengroup.org/onlinepubs/009695399/basedefs/signal.h.html__;!!ACWV5N9M2RV99hQ!JGJYVEFhN3cSMzW41RLH_PK_mMddc3HnFEtCKYUXztE2-_9cjl_tIkMLO3y5_vav3Pt2ARimbCm44kuH8n_No6q2$>)."
(2) and (3) are the interesting parts. (2) means that if the
parent process ignores SIGPIPE, child processes will also
ignore SIGPIPE. (3) means that if the parent process has a
custom handler installed for SIGPIPE, child processes will be
started with SIG_DFL (default action) for SIGPIPE. The default
action for SIGPIPE is "terminate process".
The libjvm handles SIGPIPE. We install our
`javaSignalHandler`. We do this to eat up and ignore SIGPIPE.
That we "manually" ignore the signal beside the point here -
what counts is that we set a custom signal handler for
SIGPIPE. Therefore, on execve, we behave according to rule (3)
and start the child with SIG_DFL as SIGPIPE disposition. As it
should be.
If third-party code sets the handler to SIG_IGN as in this
example, exeve will behave according to rule (2) and the child
will start with SIG_IGN as SIGPIPE disposition.
The libjig can be used to workaround this scenario, but I
wonder if that is more of an accident. The libjsig.so will
preserve the JVM's SIGPIPE handler even if third-party code
attempts to set it to SIG_IGN. That means that exeve still
behaves according to rule (2): sets child's SIGPIPE
disposition to SIG_DFL.
----
But I wonder whether this should not be considered a bug to
fix regardless of the jsig.so workaround? In jspawnhelper, we
clean the environment from various effects when exec'ing;
among other things, we reset the signal block mask for the
process. The "ignore" state of processes could be considered
along the same line. We could reset all signal handlers to
SIG_DFL before execing the child.
I know that this area is super-tricky and problems are
notoriously difficult to analyze; we should therefore be
extremely careful not to break downward compatibility. Still,
what do people think? Should be fix this in jspawnhelper?
Thanks, Thomas
(cc'ing Roger)
[1]
https://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html
<https://urldefense.com/v3/__https://pubs.opengroup.org/onlinepubs/009695399/functions/exec.html__;!!ACWV5N9M2RV99hQ!JGJYVEFhN3cSMzW41RLH_PK_mMddc3HnFEtCKYUXztE2-_9cjl_tIkMLO3y5_vav3Pt2ARimbCm44kuH8vLbF0yj$>
--
Archie L. Cobbs