Thanks, Roger, I will proceed with removing the old workaround then.

On Tue, Jul 15, 2025 at 5:05 PM Roger Riggs <roger.ri...@oracle.com> wrote:

> Hi Thomas,
>
> Simpler is better on both sides of the protocol.
> The version check will have happened before this part of the protocol so
> there's no confusion about matching expectations.
> I agree that removing it is preferred.
>
> Roger
>
>
> On 7/15/25 10:44 AM, Thomas Stüfe wrote:
> > Hi,
> >
> > I am currently working on removing (eventually) the vfork mode. Before
> > we can do this, we need a bit better error diagnostics. I do this by
> > gently improving the error handling throughout the code, so that we
> > can generate IOExceptions based on more exact knowledge.
> >
> > While working at this, I re-examined the "send aliveness ping from
> > jspawnhelper to parent" logic again I introduced back in 2019 as a
> > workaround for an obscure glibc bug with posix_spawn (see
> > https://bugs.openjdk.org/browse/JDK-8223777). I found that it was not
> > needed anymore since the glibc was fixed, so I started removing that
> > workaround (https://bugs.openjdk.org/browse/JDK-8362257).
> >
> > But then it occurred to me that an obscure part of the jspawnhelper
> > diagnostics introduced with
> > https://bugs.openjdk.org/browse/JDK-8226242 piggy-backs on the
> > aliveness ping for its
> > "detect-abnormal-process-termination-before-exec" capabilities. Works
> > like this:
> >
> > A jspawnhelper starts
> > B jspawnhelper enters childProcess() and sends alivenes ping
> > C jspawnhelper does a bunch of other things
> > D jspawnhelper exec's
> >
> > In the parent, we count abnormal child terminations that occur before
> > the aliveness ping (B) as "spawn error" and print the signal number.
> > Without the aliveness ping we could not tell apart "jspawnhelper ends
> > abnormally due to a signal" from "jspawnhelper exec()'s the user
> > program successfully, user program runs and ends abnormally due to
> > signal".
> >
> > However, the question is how important or even useful this part really
> is:
> > - for externally sent abnormal termination signals (SIGTERM etc), from
> > the caller's point of view it probably does not matter when it comes :
> > before or after exec().
> > - it only matters for synchronous termination signals (crashes) we
> > ourselves cause; but here it only covers crashes in a rather small
> > piece of code, before the liveness ping (B). Basically, just the first
> > part of jspawnhelper main(). Any crashes after the liveness ping are
> > still unrecognised by ProcessBuilder.start, and are instead handled by
> > the caller calling Process.waitFor().
> >
> > There are two ways to deal with this:
> >
> > We could do without the crash protection in point (A), which would
> > allow us to remove the liveness ping. I would very much prefer that.
> > It would simplify the jspawnhelper protocol and make it more robust.
> > Because we now don't have any communication in case no error happens -
> > there would be only a single bit of information sent back via fail
> > pipe, and only in case of an error. Fail pipe would stay quiet in case
> > of successful exec(). Abnormal child process termination in the first
> > part of jspawnhelper would be covered by the same path that detects
> > abnormal child process termination in user programs - Process.waitFor().
> >
> > If we determine we really need this crash detection, we could at least
> > improve it: move the liveness ping to just-before the exec() call, so
> > that we cover all the area from A-D. Also, do it for all modes (FORK,
> > too), to simplify coding.
> >
> > Bottomline: remove an obscure and complex small mechanism that only
> > helps a bit with detecting program errors (sigsegv etc) inside the
> > first part of jspawnhelper main() ?
> >
> > Thanks, Thomas
> >
> >
>
>

Reply via email to