Hi,

I am currently working on removing (eventually) the vfork mode. Before we
can do this, we need a bit better error diagnostics. I do this by gently
improving the error handling throughout the code, so that we can generate
IOExceptions based on more exact knowledge.

While working at this, I re-examined the "send aliveness ping from
jspawnhelper to parent" logic again I introduced back in 2019 as a
workaround for an obscure glibc bug with posix_spawn (see
https://bugs.openjdk.org/browse/JDK-8223777). I found that it was not
needed anymore since the glibc was fixed, so I started removing that
workaround (https://bugs.openjdk.org/browse/JDK-8362257).

But then it occurred to me that an obscure part of the jspawnhelper
diagnostics introduced with https://bugs.openjdk.org/browse/JDK-8226242
piggy-backs on the aliveness ping for its
"detect-abnormal-process-termination-before-exec" capabilities. Works like
this:

A jspawnhelper starts
B jspawnhelper enters childProcess() and sends alivenes ping
C jspawnhelper does a bunch of other things
D jspawnhelper exec's

In the parent, we count abnormal child terminations that occur before the
aliveness ping (B) as "spawn error" and print the signal number.  Without
the aliveness ping we could not tell apart "jspawnhelper ends abnormally
due to a signal" from "jspawnhelper exec()'s the user program successfully,
user program runs and ends abnormally due to signal".

However, the question is how important or even useful this part really is:
- for externally sent abnormal termination signals (SIGTERM etc), from the
caller's point of view it probably does not matter when it comes : before
or after exec().
- it only matters for synchronous termination signals (crashes) we
ourselves cause; but here it only covers crashes in a rather small piece of
code, before the liveness ping (B). Basically, just the first part of
jspawnhelper main(). Any crashes after the liveness ping are still
unrecognised by ProcessBuilder.start, and are instead handled by the caller
calling Process.waitFor().

There are two ways to deal with this:

We could do without the crash protection in point (A), which would allow us
to remove the liveness ping. I would very much prefer that. It would
simplify the jspawnhelper protocol and make it more robust. Because we now
don't have any communication in case no error happens - there would be only
a single bit of information sent back via fail pipe, and only in case of an
error. Fail pipe would stay quiet in case of successful exec(). Abnormal
child process termination in the first part of jspawnhelper would be
covered by the same path that detects abnormal child process termination in
user programs - Process.waitFor().

If we determine we really need this crash detection, we could at least
improve it: move the liveness ping to just-before the exec() call, so that
we cover all the area from A-D. Also, do it for all modes (FORK, too), to
simplify coding.

Bottomline: remove an obscure and complex small mechanism that only helps a
bit with detecting program errors (sigsegv etc) inside the first part of
jspawnhelper main() ?

Thanks, Thomas

Reply via email to