Thanks, Roger, I will proceed with removing the old workaround then. On Tue, Jul 15, 2025 at 5:05 PM Roger Riggs <roger.ri...@oracle.com> wrote:
> Hi Thomas, > > Simpler is better on both sides of the protocol. > The version check will have happened before this part of the protocol so > there's no confusion about matching expectations. > I agree that removing it is preferred. > > Roger > > > On 7/15/25 10:44 AM, Thomas Stüfe wrote: > > Hi, > > > > I am currently working on removing (eventually) the vfork mode. Before > > we can do this, we need a bit better error diagnostics. I do this by > > gently improving the error handling throughout the code, so that we > > can generate IOExceptions based on more exact knowledge. > > > > While working at this, I re-examined the "send aliveness ping from > > jspawnhelper to parent" logic again I introduced back in 2019 as a > > workaround for an obscure glibc bug with posix_spawn (see > > https://bugs.openjdk.org/browse/JDK-8223777). I found that it was not > > needed anymore since the glibc was fixed, so I started removing that > > workaround (https://bugs.openjdk.org/browse/JDK-8362257). > > > > But then it occurred to me that an obscure part of the jspawnhelper > > diagnostics introduced with > > https://bugs.openjdk.org/browse/JDK-8226242 piggy-backs on the > > aliveness ping for its > > "detect-abnormal-process-termination-before-exec" capabilities. Works > > like this: > > > > A jspawnhelper starts > > B jspawnhelper enters childProcess() and sends alivenes ping > > C jspawnhelper does a bunch of other things > > D jspawnhelper exec's > > > > In the parent, we count abnormal child terminations that occur before > > the aliveness ping (B) as "spawn error" and print the signal number. > > Without the aliveness ping we could not tell apart "jspawnhelper ends > > abnormally due to a signal" from "jspawnhelper exec()'s the user > > program successfully, user program runs and ends abnormally due to > > signal". > > > > However, the question is how important or even useful this part really > is: > > - for externally sent abnormal termination signals (SIGTERM etc), from > > the caller's point of view it probably does not matter when it comes : > > before or after exec(). > > - it only matters for synchronous termination signals (crashes) we > > ourselves cause; but here it only covers crashes in a rather small > > piece of code, before the liveness ping (B). Basically, just the first > > part of jspawnhelper main(). Any crashes after the liveness ping are > > still unrecognised by ProcessBuilder.start, and are instead handled by > > the caller calling Process.waitFor(). > > > > There are two ways to deal with this: > > > > We could do without the crash protection in point (A), which would > > allow us to remove the liveness ping. I would very much prefer that. > > It would simplify the jspawnhelper protocol and make it more robust. > > Because we now don't have any communication in case no error happens - > > there would be only a single bit of information sent back via fail > > pipe, and only in case of an error. Fail pipe would stay quiet in case > > of successful exec(). Abnormal child process termination in the first > > part of jspawnhelper would be covered by the same path that detects > > abnormal child process termination in user programs - Process.waitFor(). > > > > If we determine we really need this crash detection, we could at least > > improve it: move the liveness ping to just-before the exec() call, so > > that we cover all the area from A-D. Also, do it for all modes (FORK, > > too), to simplify coding. > > > > Bottomline: remove an obscure and complex small mechanism that only > > helps a bit with detecting program errors (sigsegv etc) inside the > > first part of jspawnhelper main() ? > > > > Thanks, Thomas > > > > > >