Hi Thomas,
Simpler is better on both sides of the protocol.
The version check will have happened before this part of the protocol so
there's no confusion about matching expectations.
I agree that removing it is preferred.
Roger
On 7/15/25 10:44 AM, Thomas Stüfe wrote:
Hi,
I am currently working on removing (eventually) the vfork mode. Before
we can do this, we need a bit better error diagnostics. I do this by
gently improving the error handling throughout the code, so that we
can generate IOExceptions based on more exact knowledge.
While working at this, I re-examined the "send aliveness ping from
jspawnhelper to parent" logic again I introduced back in 2019 as a
workaround for an obscure glibc bug with posix_spawn (see
https://bugs.openjdk.org/browse/JDK-8223777). I found that it was not
needed anymore since the glibc was fixed, so I started removing that
workaround (https://bugs.openjdk.org/browse/JDK-8362257).
But then it occurred to me that an obscure part of the jspawnhelper
diagnostics introduced with
https://bugs.openjdk.org/browse/JDK-8226242 piggy-backs on the
aliveness ping for its
"detect-abnormal-process-termination-before-exec" capabilities. Works
like this:
A jspawnhelper starts
B jspawnhelper enters childProcess() and sends alivenes ping
C jspawnhelper does a bunch of other things
D jspawnhelper exec's
In the parent, we count abnormal child terminations that occur before
the aliveness ping (B) as "spawn error" and print the signal number.
Without the aliveness ping we could not tell apart "jspawnhelper ends
abnormally due to a signal" from "jspawnhelper exec()'s the user
program successfully, user program runs and ends abnormally due to
signal".
However, the question is how important or even useful this part really is:
- for externally sent abnormal termination signals (SIGTERM etc), from
the caller's point of view it probably does not matter when it comes :
before or after exec().
- it only matters for synchronous termination signals (crashes) we
ourselves cause; but here it only covers crashes in a rather small
piece of code, before the liveness ping (B). Basically, just the first
part of jspawnhelper main(). Any crashes after the liveness ping are
still unrecognised by ProcessBuilder.start, and are instead handled by
the caller calling Process.waitFor().
There are two ways to deal with this:
We could do without the crash protection in point (A), which would
allow us to remove the liveness ping. I would very much prefer that.
It would simplify the jspawnhelper protocol and make it more robust.
Because we now don't have any communication in case no error happens -
there would be only a single bit of information sent back via fail
pipe, and only in case of an error. Fail pipe would stay quiet in case
of successful exec(). Abnormal child process termination in the first
part of jspawnhelper would be covered by the same path that detects
abnormal child process termination in user programs - Process.waitFor().
If we determine we really need this crash detection, we could at least
improve it: move the liveness ping to just-before the exec() call, so
that we cover all the area from A-D. Also, do it for all modes (FORK,
too), to simplify coding.
Bottomline: remove an obscure and complex small mechanism that only
helps a bit with detecting program errors (sigsegv etc) inside the
first part of jspawnhelper main() ?
Thanks, Thomas